Technology

I Stopped Dreading DNS: Automating VPS and Zero‑Downtime Deploys with Terraform, Cloudflare, and Proxmox/OpenStack

The Late‑Night Deploy That Made Me a Terraform True Believer

There was a Tuesday not long ago when I stared at my terminal, waiting for DNS to catch up with my ambition. You know that feeling, right? The deploy was ready, the VPS was humming, and the only thing between me and a clean handoff was… propagation. I had this uneasy mix of optimism and dread—like watching a train inch toward the platform, hoping you’re standing at the right door.

That night is when something clicked. I realized my process had too many places where “manual” was code for “oops.” Creating the VPS by hand. Tweaking Cloudflare DNS records. Double‑checking health checks by refreshing a page like a superstitious goalie. It worked most of the time, but when it didn’t, I felt it in my teeth. So I did what I usually do after a slightly chaotic deploy: I wrote everything down, turned the steps into code, and let Terraform take the wheel.

In this guide, I want to walk you through how I automate VPS and DNS together using Terraform, with Cloudflare handling the front door and Proxmox or OpenStack running the machines. We’ll talk about the fun parts (blue‑green deploys, canaries, and zero downtime) and the slightly less glamorous parts (state, secrets, and guardrails). I’ll share what’s worked for me, where I tripped, and what finally made those late‑night deploys feel calm instead of tense.

Terraform as Your Conductor: One Baton, Many Instruments

If you’re new to Terraform, think of it like a polite conductor for your infrastructure. You tell it what the orchestra should sound like—“a VPS here, DNS there, a health check keeping time”—and it figures out the sequence, the order, and the tempo. The magic is in the planning: you can look at the score before the music starts. The official Terraform documentation is excellent at explaining the mental model, but here’s how it feels in practice.

Instead of clicking around your panel or typing repeatable commands into muscle memory, you describe your target state. One module spins up the VPS. Another manages your DNS in Cloudflare. A third handles the load balancer or the records that make zero‑downtime swaps possible. When you run plan, Terraform diff‑checks the world you want against the world you have and shows you the gaps. When you apply, it reconciles the two, carefully.

In my experience, the payoff isn’t just speed—it’s consistency. The standardization sneaks up on you. Suddenly every VPS boots with the same SSH hardening, the same users, and the same bootstrap actions. Every DNS record follows a naming pattern you can understand months later. And every deploy has a playbook you can rerun, audit, and roll back if needed.

By the way, if you’re curious about what happens after that VPS starts breathing, I’ve written about the base layer I reuse on first boot—users, security, and services—in my post on how I use cloud‑init + Ansible for users, security, and services on first boot. Terraform sets the stage; cloud‑init and Ansible tune the instruments.

Providers You’ll Actually Use: Cloudflare, Proxmox/OpenStack

Terraform talks to “providers”—plugins that know how to create things in a specific platform. For our story, Cloudflare handles the public edge, while Proxmox or OpenStack build the servers. I reach for Proxmox when I want direct control (homelab, colo, or a private cluster), and OpenStack when I’m in a cloud that speaks that language. The idea doesn’t change much: Terraform is still telling each platform what to do using a shared vocabulary.

If you like docs (I do, after a strong coffee), the Cloudflare provider docs are straightforward and cover everything from basic A records to load balancers and monitors. For Proxmox, the Proxmox provider makes VM creation feel simple and repeatable: define the template, the storage, the network, and press play. Even if you stick with OpenStack, the workflow is similar: define an image, a flavor, a network, a security group, and a floating IP—Terraform does the rest.

Here’s the thing: once your infrastructure lives in code, it grows up a little. It becomes something you can review, test, version, and share. It’s no longer “that server I set up last summer” but “the module we use for web app nodes.” And when a teammate asks, “Can you replicate that staging box for a demo?” you can do it with a branch and a workspace instead of a week of improvisation.

A Reproducible VPS Module: The Minimum That Feels Like Luxury

When I started writing Terraform for servers, I made two small promises to myself. First, every VM must be created the same way—no sneaky manual tweaks. Second, every VM gets a bootstrap script that brings it up to production baseline. That’s it. No heroics. If I keep those two promises, everything else becomes easier: monitoring, patching, migrations, even cost management.

On Proxmox, that means I keep a gold image ready (cloud‑init capable), and my Terraform module references it by name. I give it CPU and RAM that match the workload, attach networking, and feed cloud‑init a user‑data file that installs the basics and calls Ansible for the rest. On OpenStack, it’s the same idea: image, flavor, network, key, user data. The output is a living server that looks like every other server I care about.

The shape of a simple module

Don’t worry, we’re not going full textbook. Here’s what the backbone of a VPS module tends to include in my world: a provider reference (Proxmox or OpenStack), a VM resource with image and network, a cloud‑init user‑data template, and outputs like the IP address that other modules can depend on. That last part matters more than it seems—outputs become the glue that ties VPS creation to DNS, monitoring, and deployment orchestration.

# pseudo‑HCL for a VPS module using Proxmox
provider "proxmox" {
  # token or username/password; prefer tokens with least privilege
}

resource "proxmox_vm_qemu" "app" {
  name        = var.name
  target_node = var.node
  clone       = var.template
  cores       = var.cores
  memory      = var.memory

  network { model = "virtio" bridge = var.bridge }
  disk { size = var.disk_size storage = var.storage }

  # cloud‑init basics
  ipconfig0  = "ip=dhcp"
  sshkeys    = file(var.ssh_public_key)
  cicustom   = "user=local:snippets/${var.user_data_file}"
}

output "ipv4" { value = proxmox_vm_qemu.app.default_ipv4_address }

Nothing fancy, but it’s enough to consistently birth a server that knows who it is and what to do first. And from here, everything else in Terraform can look up that IP and act with confidence.

Cloudflare DNS Without the Guesswork

DNS only feels simple when you don’t need it to be precise. The minute you want fast cutovers, good caching behavior, clean SSL, and minimal surprises, the details start to matter. Cloudflare gives you a powerful front door, but you still have to tell it what to do—and you want Terraform to be the one whispering in its ear.

For most apps, I create A or AAAA records that point to my VPS, mark them as proxied (orange cloud), and add sensible TTLs. I also add CAA records to keep certificate issuance under control and TXT records for ownership or ACME. If you’ve ever chased a weird redirect because of a stray record, you know how nice it is to keep everything declared in code where you can see it.

# pseudo‑HCL for Cloudflare DNS
provider "cloudflare" {
  api_token = var.cloudflare_api_token
}

data "cloudflare_zone" "site" { name = var.zone }

resource "cloudflare_record" "app" {
  zone_id = data.cloudflare_zone.site.id
  name    = var.hostname
  type    = "A"
  value   = var.ipv4
  proxied = true
  ttl     = 300
}

resource "cloudflare_record" "caa_letsencrypt" {
  zone_id = data.cloudflare_zone.site.id
  name    = var.zone
  type    = "CAA"
  data {
    flags = 0
    tag   = "issue"
    value = "letsencrypt.org"
  }
}

Here’s where experience kicks in: changing a proxied record at Cloudflare can be instantaneous for visitors because Cloudflare’s edge does the heavy lifting. But if you toggle proxy off and on during a deploy, you can create unexpected behavior or expose your origin. I try to keep my intentions consistent. If the app is meant to sit behind Cloudflare, it stays proxied. If it needs a direct route (rare), I make that explicit and stick with it.

Another tip: use meaningful names and subdomains for canaries and staging. “api‑canary” and “api‑green” tell stories at a glance in your Terraform code. Those names are gifts to your future self, or the teammate trying to debug traffic at 2 a.m. Your codebase becomes a map, not a maze.

Zero‑Downtime Deploys You Can Sleep Through

Okay, the fun part. Let’s talk about flips without flops—zero‑downtime deploys with a little help from Cloudflare. I’ve tried a bunch of approaches over the years, but the pattern I keep coming back to is a clean blue‑green flow. You keep the current version (blue) serving traffic, prepare the new version (green) on fresh infrastructure, verify everything quietly, then steer traffic from blue to green smoothly. No dramatic cutovers, no prayers to the DNS gods.

Blue‑green with Cloudflare Load Balancer

Cloudflare’s Load Balancer gives you the controls you wish DNS had: pools, health checks, weighted steering, and per‑origin drain. In Terraform, it’s a few resources that connect those dots. You define two pools (blue and green), assign their origins (your VPS IPs), attach a monitor (HTTP/HTTPS health check), and create a load balancer that sits in front of your hostname. During a deploy, you shift weight from blue to green. If something smells off, you shift it back—instantly.

# pseudo‑HCL for Cloudflare LB blue‑green
resource "cloudflare_load_balancer_monitor" "http" {
  zone_id          = data.cloudflare_zone.site.id
  type             = "http"
  expected_body    = "ok"
  expected_codes   = "200"
  method           = "GET"
  path             = "/health"
  interval         = 30
  timeout          = 5
  retries          = 2
}

resource "cloudflare_load_balancer_pool" "blue" {
  name     = "app-blue"
  monitor  = cloudflare_load_balancer_monitor.http.id
  origins { name = "blue-1" address = var.blue_ip }
}

resource "cloudflare_load_balancer_pool" "green" {
  name     = "app-green"
  monitor  = cloudflare_load_balancer_monitor.http.id
  origins { name = "green-1" address = var.green_ip }
}

resource "cloudflare_load_balancer" "app" {
  zone_id     = data.cloudflare_zone.site.id
  name        = var.hostname
  default_pools = [cloudflare_load_balancer_pool.blue.id, cloudflare_load_balancer_pool.green.id]
  fallback_pool = cloudflare_load_balancer_pool.blue.id
  session_affinity = "cookie"
  steering_policy  = "dynamic_latency"
}

With this setup, “switching” becomes “adjusting weights.” It’s the difference between flipping a light switch and using a dimmer. And if your app supports it, a small canary (say, a small percentage) lets you feel the new version under real traffic before you commit. The nice thing about Terraform is that your deploy history becomes code history—you can see exactly how traffic changed over time.

When I helped a client migrate a busy API from a single VPS to a pair of nodes, we staged the green pool throughout the week, automated the health checks, and tied deploys to Terraform plans in CI. On the day of the switch, we nudged traffic over slowly. Five minutes later, users were on green, and the blue servers went quiet like a train station at midnight. No alerts, no panicked rollbacks. Just a dimmer moving left to right.

If you want to dig into edge behaviors—timeouts, keep‑alives, and the stuff that makes live connections feel graceful—I’ve written a calm, practical walkthrough in the guide to Nginx timeouts, keep‑alive, and zero‑downtime behind Cloudflare. It pairs beautifully with a Terraform‑driven rollout.

What if you don’t use Cloudflare Load Balancer?

You can approximate blue‑green with plain DNS by publishing “blue” and “green” hostnames and shifting your app’s origin at the proxy layer. It works, but you lose the fine‑grained control, health checking, and per‑origin drain that makes the experience buttery smooth. If budget or simplicity steers you that way, be explicit in your Terraform about record names, TTLs, and the cutover process. Write that choreography into code—and stick to it.

State, Secrets, and Guardrails That Save Future You

The first time I used Terraform for a multi‑environment rollout, I learned a lesson the easy way: treat state like a crown jewel. Store it remotely with locking, back it up, and don’t let it lie around on laptops. When state is safe, so are your deploys. It’s not glamorous, but it’s the difference between confident changes and white‑knuckle guessing.

I like remote backends with locking because they prevent two applies from stomping on each other. Workspaces help, too: dev, stage, prod, and the occasional ad‑hoc test. Variables for secrets (Cloudflare tokens, provider credentials) should come from a vault or your CI’s secret store, never from a file synced to Git. You’ll sleep better.

Guardrails in code matter. I use create_before_destroy in lifecycle blocks when I’m replacing a node on purpose, so Terraform creates the new server first and only destroys the old one after the new one is healthy. I also use explicit dependencies where it’s easy to be clever and wrong—like ensuring that DNS only points to a server after its health endpoint returns the good kind of noise. If your app takes a minute to warm up, wait for it. Terraform isn’t a race; it’s choreography.

One more habit that’s paid for itself many times: treat load balancers and floating IPs as long‑lived objects. I try not to destroy the stable “front door” during a rollout. New versions should arrive behind it, take over gracefully, and let the old version step away without a fuss. Blue, then green, then back to blue next time—it’s a rhythm, not a one‑off event.

Security: Keys, Firewalls, and Quiet Confidence

Security in this setup is mostly about doing simple things every time. First, least‑privilege API tokens. Your Cloudflare token should be able to manage only the zones and resources it needs, not the entire account. Same idea for Proxmox or OpenStack access—narrow it to the cluster or project that matters.

Second, predictable boot hardening. I disable password SSH logins, use authorized keys, and keep a repeatable firewall baseline. If you want a practical pattern that won’t fight you later, I’ve written a friendly walkthrough in the nftables firewall cookbook for VPS: rate limiting, port knocking, and IPv6 rules. It pairs nicely with cloud‑init and Terraform outputs.

Third, protect the places where you tinker. Admin panels and hypervisors make me nervous in the best possible way, so I put them behind client certificate authentication. It sounds fancy, but the day‑to‑day is simple: your browser has a cert, the server trusts it, and anonymous internet traffic bounces away politely. If you want a step‑by‑step, here’s how I stopped worrying about admin logins: protecting panels with mTLS on Nginx.

Last, don’t forget your bootstrap. If your VPS comes online already knowing how to configure users, rotate logs, and register its health endpoint, everything else becomes calmer. That’s exactly why I lean so hard on cloud‑init with a tiny handoff to Ansible. Terraform gets the house built; the rest of the toolchain makes it a home with running water and working lights.

Gotchas I’ve Collected (So You Don’t Have To)

I wish I could say my Terraform journey has been perfectly smooth, but bumps make the story better. Here are a few I see often.

First, the “why is this record still pointing to the old IP?” mystery. If you’re using Cloudflare’s proxy, you may be staring at a cached edge route while your Terraform state says “all good.” Nine times out of ten, the fix is to ensure your cutover happens at the load balancer layer, not with rapid proxy toggles. Keep the orange cloud on. Swap origins behind it.

Second, cloud‑init timing. On some images, network availability lags just enough to make your first health check fail, which causes a false red in the load balancer. I usually solve this two ways: let the application be noisy but honest about readiness (a /health endpoint that waits for dependencies), and give the monitor a few retries and a gentle interval. You don’t get points for being twitchy.

Third, disk and image drift. If your Proxmox template gets rebuilt quietly or your OpenStack image changes its kernel in a minor update, your terraform apply might suddenly create nodes with behavior that doesn’t match your expectations. Pin image versions or use a specific template name that you control. And when you do update, update it on purpose with a clear commit message.

Fourth, the “floating IP vanish” trick. Don’t let Terraform destroy and recreate your floating IPs or load balancer hostnames during a rollout. Mark them as keepers with lifecycle blocks or move them into a stable module that rarely changes. Your users won’t notice an app replacement if the front door never blinks.

Finally, CI impatience. Pushing Terraform into a pipeline is glorious until your health checks or bootstraps need an extra minute. Add timeouts that match reality, not optimism. Tests should prove the system is ready, not that it can sprint when it’s waking up.

A Calm Blueprint for Putting It All Together

When someone asks me “What’s the simplest way to start?” I suggest a small, honest blueprint. One repository, one Terraform root with modules for vps, dns, and lb. One workspace per environment. A make target for plan and apply. A user‑data template with cloud‑init that installs a tiny web app and exposes /health. A Cloudflare DNS record that points to a load balancer with a single pool. When that’s working, you add the second pool. Then you add a canary. Then a CI pipeline that applies on tagged commits.

Even on day one, that blueprint gives you a gift: a reliable path to recreate your environment. If a node goes bad, you don’t troubleshoot yourself down a rabbit hole—you replace it. If something feels off in production, you cut traffic to green, find your breath, and debug at your pace. Recovery becomes a lever, not a panic button.

It’s also a natural fit for a broader reliability mindset. If this kind of careful, repeatable flow resonates with you, you might like my write‑up on writing a no‑drama disaster recovery plan with runbooks that actually work. Terraform is a piece of that puzzle, but the habit of calm repetition is the full picture.

A Few Practical Riffs and Patterns

Over time, certain little patterns keep paying rent. I like to attach a tiny metadata tag to every resource with environment and git commit short SHA. When I’m staring at metrics or digging through logs, those tags are like signposts. I also like to generate meaningful hostnames—app‑blue‑01, app‑green‑01—so I always know who’s who in a topology map.

On the DNS side, I’ll sometimes create dedicated records for maintenance pages. It’s nothing more than a separate pool that serves a static “We’re rolling out something nice, be right back” page, but because it lives in Terraform alongside the app, it’s fast to activate without touching the app nodes. If you’ve ever needed to pause the world for ten minutes, you’ll appreciate that little lever.

And when I’m dealing with APIs or long‑lived connections—WebSockets, gRPC—I test with real clients during the canary. You can do everything “right” and still discover that a proxy timeout is one notch too low or a keep‑alive setting is a touch too eager. That’s why I wrote the Cloudflare + Nginx post I mentioned earlier; the network layer has feelings too, and Terraform helps you treat them kindly.

Putting Confidence on Autopilot

When your infrastructure is code, confidence becomes muscle memory. I don’t mean arrogance. I mean the quiet kind of confidence where your deploy ritual is so consistent that surprises feel rare and containable. Plan. Apply. Watch the health checks go green. Shift 10% of traffic. Wait. Shift the rest. Retire the old pool and move on with your day.

If you want to keep reading beyond Terraform and cut down stress in adjacent places, there’s a practical, step‑by‑step piece I wrote on running WordPress on Docker Compose without the drama. Different tools, same vibe: small, repeatable steps that never feel brittle.

Wrap‑Up: The Calm Path from Clicks to Code

So here’s the thread I hope you’ll pull: when you automate VPS and DNS with Terraform, the scary parts shrink. Proxmox or OpenStack become predictable. Cloudflare stops feeling like a black box. Zero‑downtime deploys aren’t “that one heroic night” but just the way you ship. You’re not at the mercy of DNS propagation or human memory anymore; you’ve got a script, a plan, and a graceful escape hatch if something twitches.

If you’re starting from scratch, pick a tiny target. One service, one subdomain, one VPS. Write a module that creates the server, another that publishes the record, and a simple health check. Then add the load balancer and practice a blue‑green cutover while traffic is small. The day you do it in production, it’ll feel familiar because you already danced that dance in staging.

And if you’ve been doing this for years, my gentle nudge is to push one more piece into code: maybe it’s the health monitor, maybe it’s the canary record, maybe it’s the runbook that your future self will thank you for. Little bits of consistency add up. They turn late‑night deploys into early nights. Hope this was helpful. See you in the next post, and may your plans always be green.

Frequently Asked Questions

Great question! I like a simple three‑module shape: vps, dns, and lb. The vps module creates a VM from a cloud‑init‑ready image (Proxmox or OpenStack), sets CPU/RAM/network, and outputs the IP. The dns module uses that IP to publish Cloudflare records. The lb module is optional at first, but it becomes your zero‑downtime lever—define pools, a health monitor, and a load balancer for your hostname. Keep secrets out of the repo, use a remote backend with locking, and standardize cloud‑init so every server boots the same way.

You can get close with careful DNS and an app‑level proxy, but it’s harder to make it smooth. The load balancer adds health checks, per‑origin draining, and traffic weighting. If you skip it, create separate blue/green hostnames, keep TTLs predictable, and do the swap at your reverse proxy. Test with a canary before a full cutover. It works, but you’ll miss the instant, controllable switchover that makes blue‑green feel like a light dimmer instead of a switch.

Use a remote backend with locking, and treat state like production data. Don’t keep it on laptops. For secrets (Cloudflare tokens, Proxmox/OpenStack credentials), inject them via your CI’s secret store or a vault, not a .tfvars committed to Git. Workspaces help isolate dev/stage/prod. Add lifecycle create_before_destroy where replacing nodes is normal, and avoid destroying long‑lived front doors like load balancers or floating IPs. Those small guardrails make rollouts calm and predictable.