Technology

From Blank VPS to Ready‑to‑Serve: How I Use cloud‑init + Ansible for Users, Security, and Services on First Boot

So I’m in the office, coffee warming my hands, and a message pings me: “Can you spin up another VPS like that staging one from last month?” I glance at my notes, then at the server list, then back at my notes. Ever had that moment when you’re not sure if the last server had Fail2ban tuned or if you only did that on the one before it? That’s the little anxiety I used to live with—tiny inconsistencies that snowball into outages, weird bugs, and late-night patchwork.

Here’s the thing: a VPS should be a recipe, not a memory test. That’s where cloud‑init plus Ansible became my calm combo. The idea is simple: every fresh server boots, creates the right users, locks down SSH, sets a firewall, installs updates, and brings your services online, all without me pasting a single command. And yes, it does it the same way every time. In this guide, I’ll walk you through how I wire this up, the little “gotchas” I learned the hard way, and a pattern you can reuse for everything from a tiny microservice to your next WordPress stack.

Why Reproducible VPS Matters More Than You Think

I used to think configuration drift was something that happened to big teams with complicated change processes. Then I compared two of my “identical” VPS instances and discovered one had auto‑updates disabled (don’t ask), while the other was running a different SSH config. Multiply that by a few services, then add a surprise kernel update, and you’ve got a recipe for “works on one, breaks on the other.”

Reproducibility isn’t just a neat trick—it’s your insurance policy. When your setup lives as code, you can destroy and recreate servers without worrying if you forgot a sysctl tweak or missed a package. Backups get simpler. Incident recovery gets calmer. And when a teammate asks, “How is this server configured?” you can point at the exact YAML instead of shrugging at your shell history.

That confidence also changes how you work. Instead of “don’t touch the fragile thing,” you start thinking in terms of safe changes you can roll forward or back. If you’ve ever written a disaster recovery plan and felt it was more wish than runbook, you’ll get why this matters. If you want a bigger picture on resilience, I’ve shared my way of turning mayhem into a plan in how I write a no‑drama DR plan.

The Mental Model: cloud‑init Starts the Party, Ansible Sets the Table

Think of cloud‑init as your server’s first‑boot robot. It shows up the moment your VPS wakes up, sets up users, SSH keys, and a few essentials, and then hands over to your configuration management tool of choice. The hand‑off I like is Ansible—it’s agentless, it’s readable, and the idempotency model keeps your playbooks friendly over time.

On a new instance, cloud‑init reads a “user‑data” file you provide. That’s where you declare the admin user, disable root SSH, maybe tweak a timezone, and run a small bootstrap that installs Git and Ansible. From there, I use ansible‑pull to grab my playbooks from a repository and run them locally on the server. No central control machine needed for that first run.

In my experience, this split keeps things clean. cloud‑init handles first‑boot truths—users, keys, basic packages—and Ansible handles everything that can evolve: services, configs, and security hardening that might get tuned later. If a change fails, Ansible gives you a clear diff and a path to retry. If the bootstrapping fails, you can spot it right in the cloud‑init logs and fix the minimal starting point before the rest piles on.

Designing First Boot: Users, SSH, and a Calm Starting Point

Let’s start with the basics: you want a non‑root user with sudo, SSH key authentication only, and root login disabled. You also want a timezone, locale, and maybe a few packages ready. Here’s a compact cloud‑init user‑data I’ve used as a starting point on Ubuntu/Debian‑style images. Adapt paths and groups to your distro if needed.

#cloud-config
hostname: app-1
manage_etc_hosts: true
timezone: UTC
package_update: true
package_upgrade: true
ssh_pwauth: false
disable_root: true
users:
  - name: deploy
    gecos: Deploy User
    shell: /bin/bash
    sudo: 'ALL=(ALL) NOPASSWD:ALL'
    groups: [sudo]
    lock_passwd: true
    ssh_authorized_keys:
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAI...your-public-key... comment
packages:
  - git
  - python3
  - python3-pip
  - python3-venv
  - ufw
  - curl
  - ca-certificates
write_files:
  - path: /etc/motd
    permissions: '0644'
    content: |
      Welcome to a reproducible VPS. Changes belong in code.
runcmd:
  - ufw allow OpenSSH
  - ufw --force enable
  - apt-get install -y ansible
  - sudo -u deploy bash -lc 'mkdir -p ~/infra && cd ~/infra && git clone --depth=1 https://example.com/your/ansible-repo.git . || true'
  - sudo -u deploy bash -lc 'cd ~/infra && ansible-pull -U https://example.com/your/ansible-repo.git -i hosts.yml site.yml'

A few notes from the trenches. First, ssh_pwauth: false and disable_root: true are your quick wins for safe SSH defaults. Second, pick a single admin user name like “deploy” and stick to it across projects; your playbooks will be happier. Third, I install Ansible directly for simplicity, but on bigger environments I pin a version or use a virtualenv to lock it down. Fourth, if your provider image doesn’t have Python or Git, install them here; Ansible needs Python on the target.

If you haven’t fully nailed SSH hardening or you want to level up to hardware keys or an SSH CA, I wrote a step‑by‑step walk‑through that pairs perfectly with this setup: VPS SSH Hardening Without the Drama. It complements cloud‑init by tightening what the first‑boot user can do and how keys are rotated over time.

Security Baseline on First Boot: Firewalls, Updates, and a Few Quiet Guards

I treat first‑boot security like laying down floorboards. It doesn’t have to be the final look, but it must be solid enough to stand on. My go‑to baseline is simple: a host firewall, SSH‑only exposure at first, automatic security updates, and Fail2ban watching the front door. Then Ansible takes over and brings in the customized policies.

On firewalls, I like to start with UFW for quick “allow SSH and enable” and then switch to nftables via Ansible for a readable, reproducible ruleset. It gives me rate limiting and IPv6 coverage with a single rules file. If you’re curious about a practical, copy‑and‑tweak style set of rules, I broke down my approach in the nftables firewall cookbook for VPS. You can wrap those rules into a template and render them with Ansible on first boot.

Next: automatic security updates. On Debian and Ubuntu, unattended‑upgrades is my friend. I scope it to security patches and let Ansible manage the config so I can pin reboots to maintenance windows or at least get notified. The “set and forget” feeling is tempting, but do yourself a favor and test reboots on a staging VPS from time to time. One of my clients learned the hard way that an old kernel module they depended on didn’t load after a minor update. We caught it in staging the next time because the server was reproducible and throwaway‑rebuildable.

As for SSH, Fail2ban is still surprisingly effective against noisy scans. I keep the default jail for SSH and tune the ban time a bit longer on internet‑facing hosts. It’s not the final wall—you want good keys, maybe even the SSH CA path—but it stops the constant thud‑thud of log entries and buys you peace of mind.

If you run admin panels, CI dashboards, or anything that shouldn’t be publicly reachable, consider going one notch higher: mTLS for admin. I shared a guide on locking down panels with client certificates right at the reverse proxy layer: protecting panels with mTLS on Nginx. The trick is to wire that policy into your Ansible role so new servers inherit it with no surprises.

Let Ansible Drive: Services, Idempotency, and a Calm Deploy Flow

Once cloud‑init hands off, Ansible becomes the conductor. The playbook is where you describe exactly what “ready” means: Docker installed and pinned, a system user for your app, Nginx with a clean config, your service unit in systemd, and any persistent volumes or backups configured.

I like to keep the first run simple, then layer complexity with tags and roles. Start with a site.yml that does your baseline and app role. Here’s a sketch of tasks that bring a small web app online.

---
- hosts: localhost
  connection: local
  become: true
  vars:
    app_user: 'app'
    app_dir: '/opt/app'
    domain_name: 'example.com'
  roles:
    - role: baseline
    - role: docker
    - role: app

# roles/baseline/tasks/main.yml
- name: Ensure packages for baseline
  apt:
    name:
      - unattended-upgrades
      - fail2ban
      - nftables
    state: present
    update_cache: true

- name: Configure unattended-upgrades
  template:
    src: unattended-upgrades.j2
    dest: /etc/apt/apt.conf.d/50unattended-upgrades
  notify: Restart unattended-upgrades

- name: Deploy nftables rules
  template:
    src: nftables.conf.j2
    dest: /etc/nftables.conf
  notify: Reload nftables

- name: Ensure nftables enabled
  systemd:
    name: nftables
    enabled: true
    state: started

# roles/docker/tasks/main.yml
- name: Install Docker CE
  apt:
    name: ['docker.io']
    state: present

- name: Ensure docker service
  systemd:
    name: docker
    enabled: true
    state: started

# roles/app/tasks/main.yml
- name: Create app user and directories
  user:
    name: "{{ app_user }}"
    system: true
    create_home: false
- file:
    path: "{{ app_dir }}"
    state: directory
    owner: "{{ app_user }}"
    group: "{{ app_user }}"
    mode: '0755'

- name: Deploy app container
  copy:
    src: files/docker-compose.yml
    dest: "{{ app_dir }}/docker-compose.yml"
    owner: "{{ app_user }}"
    group: "{{ app_user }}"

- name: Create systemd unit for compose
  template:
    src: app-compose.service.j2
    dest: /etc/systemd/system/app-compose.service
  notify: Restart app

- name: Ensure app running
  systemd:
    name: app-compose.service
    enabled: true
    state: started

# handlers
- name: Reload nftables
  command: nft -f /etc/nftables.conf

- name: Restart unattended-upgrades
  systemd:
    name: unattended-upgrades
    state: restarted

- name: Restart app
  systemd:
    name: app-compose.service
    state: restarted

Nothing here is exotic. It’s the simplicity that makes it robust. Idempotency means you can run this on day one, day fifty, or day five hundred, and you’ll end up in the same state. If a configuration change triggers a handler, Ansible restarts the right service and leaves the rest alone.

Secrets deserve special care. On first boot, you want your app to get its credentials without exposing them in the cloud‑init file or in shell history. I reach for Ansible Vault for static secrets and use provider‑side secret stores or SOPS if the team already leans that way. The trick is to keep secret distribution out of cloud‑init and let the repo pull in what it needs securely during the Ansible run. Commit templates, not secrets.

If your target is a CMS like WordPress or a classic LEMP stack, you can also approach services with orchestration. I shared a calm, repeatable Docker Compose flow that turns updates into a non‑event: WordPress on Docker Compose, without the drama. The patterns map nicely to first‑boot Ansible: compose files, volumes, and backup containers all land in the same place every time.

ansible‑pull or Control Node? Picking Your First‑Boot Channel

On small teams and single‑purpose VPSes, I love ansible‑pull. It clones a repository and runs locally, which means you don’t need SSH reachability from a control node on boot. Your cloud‑init script can fire it once, and a systemd timer can keep it pulling updates on a schedule. It’s minimal, auditable, and easy to reason about.

When I’m managing lots of servers or need shared state (like inventory generated from a source of truth), a central control node or CI runner is handy. In that case, cloud‑init still sets the baseline, but the first full configuration run comes from the controller after the host registers. Both work. The important bit is to pick one and codify it so your future self knows what’s supposed to happen.

For the curious, the official docs are approachable and worth bookmarking: cloud‑init documentation and ansible‑pull command guide. They’re also great for cross‑checking little platform quirks that show up between providers.

First‑Boot Troubleshooting: Logs, Reruns, and Surprising Little Gotchas

My first days with cloud‑init came with their share of “why didn’t it run?” puzzles. The good news is you can get very far by reading two logs and knowing one reset command. The logs live in /var/log/cloud-init.log and /var/log/cloud-init-output.log. They tell you exactly which stage ran and what failed. If you want to rerun from scratch, cloud-init clean resets state and the magic happens again on next reboot.

Network timing can be a surprise. Some providers bring the NIC up a fraction later, which can spook package installs. The fix is usually retry logic: let Ansible handle package jobs with install retries and a small delay. If you install Ansible itself via cloud‑init, that’s one spot where a quick one‑liner loop around apt can save the day. Another gotcha is Python. Some minimal images ship without it—no Python, no Ansible. Install it in cloud‑init’s package list and you’re fine.

One more: disk and swap. Not every image sizes disks the same way on first boot. If you rely on a specific partition, filesystem, or swap file, codify it. Ansible’s community.general collection has tasks for filesystems and mounts that turn a fragile manual step into something reliable. I’ve had VPSs where a forgotten swap file led to mysterious OOM kills under load; no fun. Putting those bits in your baseline role pays off forever.

Provider Differences and Making It Truly Portable

Even with cloud‑init standardization, each provider has personality. Device names, default users, and preinstalled packages vary. My way through this is to keep a tiny vars file per provider or distribution. Name the differences, don’t fight them. If one provider uses ens3 and another uses eth0, teach your role about both and move on.

For keys and images, I like to rely on provider user‑data uploads or automation through their APIs, but the playbooks don’t assume anything beyond “I can SSH as the admin user.” If you want a confidence boost before real deployments, you can practice locally with Multipass or cloud‑init‑aware images in your virtual lab. The feedback loop is tight: boot, read the logs, tweak, boot again, smile.

And if your services sit behind a reverse proxy or CDN, bake those expectations into your roles too. Timeouts, keep‑alives, and zero‑downtime reloads live happily as code. I shared a calm approach to keeping long‑lived connections happy, which fits naturally into an Nginx role: keeping WebSockets and gRPC happy behind Cloudflare. Your first boot shouldn’t just start services; it should start them with the right expectations.

A Story About Drift, and the Calm After

One of my clients ran three VPSs for a simple SaaS. They were “the same” until one started logging mysterious 502s at random. We diffed configs and found nothing obvious. After an afternoon of digging, we realized only one server had a small sysctl tweak applied manually months earlier. It helped under load but wasn’t reproducible or documented. The fix was blunt: we wrote the baseline as Ansible, let cloud‑init handle the first‑boot bits, and replaced all three servers with clean builds. The bug vanished. More importantly, so did the anxiety about hidden differences.

That experience shaped how I see servers. I don’t want pets, I don’t really want cattle either—I want a recipe. My runbooks are basically “destroy and rebuild” now. If I need a one‑off tweak, it goes in the role. If it’s a secret, it goes in Vault. If it’s an exception, it’s written down and explicitly tagged. The servers stopped being a suspicion and started being a certainty.

Putting It All Together: A Smooth First‑Boot Flow

Here’s the flow I hand to teams when we’re getting started:

Step one: prepare your cloud‑init user‑data with a single admin user, SSH key‑only auth, root SSH disabled, packages for Git and Python, and a tiny runcmd that installs Ansible and runs ansible‑pull. Keep it minimal and safe.

Step two: write an Ansible baseline role that sets your host firewall, unattended security updates, Fail2ban, and any sysctl you need. Make sure handlers restart only what they must. Add a role for your app, with systemd units, directories, and any volumes you need. This is your definition of “ready.”

Step three: move secrets out of cloud‑init and into Ansible Vault or your chosen secret store. Reference them in templates or environment files that land on the server during the playbook run. You’ll sleep better.

Step four: test like you mean it. Boot a fresh VPS, read the cloud‑init logs, verify the playbook ran, and check your services. Reboot once to be sure. Destroy it. Do it again. The moment you can do this calmly twice in a row, you’re ready for production.

Step five: add little quality‑of‑life improvements over time. A systemd timer to re‑run ansible‑pull nightly. A healthcheck that barks if a service fails. Maybe a microcache in Nginx if your app benefits from it. If you’re curious about squeezing free speed safely, I wrote about the approach in the 1–5 second Nginx microcaching trick; it’s the sort of optimization that belongs in code, not a one‑off shell tweak.

Bonus Tips: Logs, Keys, and Safer Admin by Default

Two finishing touches have saved me headaches. First: logs. Send your key system logs somewhere outside the box, even if it’s just a remote syslog target or a tiny log shipper. If a server has a bad day, you’ll want its last words. Second: admin access. If you have to expose an admin panel, don’t leave it to passwords alone. Put it behind IP allowlists, or better yet, ship client‑cert auth with your reverse proxy role. The difference in noise is night and day, and it’s reproducible from the first boot onward.

And yes, SSH keys deserve rotation. If you’re feeling adventurous and want to future‑proof that part of your stack, keep exploring the SSH CA route from earlier. Central trust for keys, short lifetimes, and no chasing old authorized_keys files around. It’s one of those upgrades that pays dividends the next time someone leaves the team or loses a laptop.

Wrap‑up: Calm, Consistent, and Kind to Your Future Self

If I had to summarize this whole approach in a sentence, it would be: teach your VPS how to become itself. Let cloud‑init give it a safe birth—users, SSH, basic packages—and let Ansible raise it into the server you actually need. The beauty is not just speed; it’s the lack of surprises. When something breaks, you’ll know exactly where to look. When you need a new instance, you won’t be copy‑pasting from a wiki page—or worse, from memory.

Start small. One user. One firewall rule. One service. But write it as code and let it run on first boot. The next time someone asks you to spin up “the same as last time,” you’ll smile and say, “Sure,” because you’ve got the recipe. Hope this was helpful! If you want to keep going down the calm‑ops path, take a peek at how I write DR plans, harden SSH with FIDO2 and an SSH CA, keep your host firewall clean with nftables, and make your reverse proxy play nicely with long‑lived connections. See you in the next post!

Frequently Asked Questions

Great question! You don’t need a control server to get started. I often use ansible‑pull from cloud‑init so the VPS installs Ansible, clones your repo, and applies the playbook locally. For larger fleets or shared inventory, a central control node or CI pipeline is nice, but for a single VPS or a small group, ansible‑pull is simple and reliable.

Here’s the deal: keep secrets out of user‑data. Use Ansible Vault for static secrets and decrypt them during the playbook run, or wire in a provider/remote secret store. Cloud‑init should create the admin user and install prerequisites, then Ansible pulls secrets securely when configuring services. Templates carry variable references, not raw secrets.

Start with the logs: /var/log/cloud-init.log and /var/log/cloud-init-output.log show what ran and why it failed. If you need a clean rerun, use cloud-init clean and reboot. On the Ansible side, run the playbook with increased verbosity, and add retry logic for package installs. Most first‑boot hiccups are timing/network issues or missing Python/Git—fix those in your cloud‑init packages.