Project Overview #
A production-grade Zero-Trust infrastructure deployment for running OpenClaw — an autonomous AI coding assistant — on Microsoft Azure. The project provisions a secure, cost-optimized, and fully automated cloud environment where the AI agent runs 24/7 with no publicly exposed ports.
The entire infrastructure is defined in Terraform and uses cloud-init for automated VM bootstrapping. Secrets are managed entirely through Azure Key Vault with Managed Identity authentication — no credentials are ever stored in code, environment variables, or on disk.
Architecture #
The infrastructure follows a Zero Trust network model:
- No public inbound access — traditional web ports (80, 443) are completely closed.
- Tailscale mesh VPN provides authenticated, encrypted access to the AI agent’s web UI and SSH.
- Emergency SSH (port 22) is restricted to a single allowed IP for debugging if Tailscale fails.
- Managed Identity grants the VM read-only access to Key Vault — no static credentials anywhere.
Deployment Flow #
After terraform apply, the following orchestration occurs automatically:
- Terraform provisions the Resource Group, VNet, NSG, Key Vault, Managed Identity, and VM.
- Cloud-init runs on first boot — installs Docker, Chrome, and Azure CLI.
- The VM authenticates via Managed Identity (
az login --identity) and fetches secrets from Key Vault. - Tailscale connects the VM to the private mesh network, making the AI agent accessible securely.
- OpenClaw is installed and configured with the retrieved AI provider API keys.
Tech Stack #
| Layer | Technology | Purpose |
|---|---|---|
| Compute | Azure VM (Standard_B2s) | Burstable Ubuntu 24.04 LTS host for the AI agent |
| Networking | VNet + NSG | Private network with deny-all inbound rules |
| Ingress | Tailscale Mesh VPN | Zero-Trust authenticated access (no public ports) |
| Secrets | Azure Key Vault | Stores AI API keys and Tailscale auth keys securely |
| Identity | User-Assigned Managed Identity | Credential-free authentication to Azure services |
| Bootstrapping | Cloud-init | Automated VM setup: Docker, Chrome, Tailscale, Azure CLI |
| IaC | Terraform (azurerm ~> 3.90) | Full infrastructure automation with modular config |
| Cost Control | Azure Consumption Budgets | Alerting at 80% of ~$30/month budget |
Security Highlights #
- Zero Open Ports: Ingress is exclusively via Tailscale — no web ports are exposed to the internet.
- Managed Identity: The VM authenticates to Azure services without any stored credentials. The identity has read-only permissions scoped to the specific Key Vault.
- Secret Separation: Infrastructure (Terraform) is separated from configuration (Key Vault secrets). API keys can be rotated in Key Vault without redeploying the VM.
- Network Isolation: The NSG blocks all inbound traffic except SSH from a single whitelisted IP. The AI agent’s web UI (port 18789) is accessible only through the Tailscale tunnel.
Challenges #
- Managed Identity Boot-Race: The VM’s Managed Identity takes a few seconds to propagate after provisioning. Implemented a retry loop (30 attempts × 5s) in the cloud-init script to handle the eventual consistency of Azure’s identity service.
- Secret Timing: The Key Vault is created empty by Terraform, but the cloud-init script expects secrets on first boot. Designed a self-healing flow where the VM retries or can be restarted after manual secret upload.
- Compute Sizing: Balancing cost vs. capability for an AI agent that needs Docker, Chrome (for web browsing), and the agent runtime.
Standard_B2s(2 vCPUs, 4GB RAM) provides burstable performance at ~$30/month. - VPN vs. Public Access: Evaluated Azure VPN Gateway, Application Gateway, and direct public IP. Chose Tailscale for zero infrastructure cost, simpler setup, and stronger zero-trust guarantees.
Key Learnings #
- Designing a Zero-Trust network architecture for an AI agent — shifting security from perimeter to identity
- Implementing credential-free authentication using Azure Managed Identity and Key Vault
- Using cloud-init for declarative, idempotent VM bootstrapping without shell scripts on disk
- Evaluating compute trade-offs (VMs vs. containers vs. serverless) for stateful, long-running workloads
- Configuring Tailscale as a lightweight, cost-free alternative to Azure VPN Gateway
- Managing infrastructure-secret separation — Terraform defines the “shape,” Key Vault provides the “values”
- Implementing budget alerts for cost governance on personal cloud projects
References #
- OpenClaw — The AI coding assistant
- Tailscale Documentation
- Azure Key Vault Documentation
- Terraform Azure Provider
- Cloud-init Documentation