So there I was on a sleepy Tuesday evening, staring at a dashboard that showed a flood of tenants connecting their domains right before an upcoming marketing launch. You know that feeling when your system is fine in the lab, but the moment real customers show up, the edges start to fray? One by one, vanity domains were rolling in: brands with their own identity, their own DNS, their own rules. And every single one needed HTTPS, instantly. No manual steps. No weird “please click here to verify” emails. Just: they plug in their domain, we serve it securely. That’s when DNS‑01 ACME stopped being a theoretical best practice and turned into the quiet hero of the night.
If you’ve ever built a multi‑tenant SaaS where customers bring their own domain, you’ve lived this. HTTP‑01 is tempting, but you don’t always control the domain’s HTTP path yet. DNS‑01, on the other hand, lets you prove control with a TXT record and issue certificates even before traffic shifts. In this post, I’ll walk you through how I’ve made custom domains and Auto‑SSL feel boring—in a good way. We’ll talk about the mental model, the architecture that scales, the little traps that snag you at 2 a.m., and the calm strategies that keep it all humming.
İçindekiler
- 1 Why Custom Domains in SaaS Feel Tricky (Until They Don’t)
- 2 The DNS‑01 ACME Mental Model (Explained Like We’re Sketching on a Napkin)
- 3 The Architecture That Scales: Orchestrators, Adapters, and a Calm Event Loop
- 4 Rate Limits, Spiky Traffic, and Keeping Calm When Everyone Onboards at Once
- 5 Security Boundaries You Can Sleep On
- 6 Edge Routing, Zero‑Downtime Issuance, and a Nice Cup of Hot Reload
- 7 Real‑World Gotchas I Still Watch For
- 8 Stories from the Field: The Weekend We Issued for 1,200 Domains
- 9 Putting It Together: A Calm Implementation Checklist (No Panic Required)
- 10 Wrap‑Up: Make SSL the Least Interesting Part of Your SaaS
Why Custom Domains in SaaS Feel Tricky (Until They Don’t)
What “bring your own domain” actually means
In a multi‑tenant world, customers want their brand front and center. Instead of tenant.example.com/acme, they want app.acme.com or even the apex acme.com. You can route that anywhere—Edge, load balancer, origin cluster—but the twist is that HTTPS must be ready the moment their DNS starts pointing to you. No trust, no clicks, no scary warnings. Just green locks all around.
Here’s the thing: when you rely on HTTP‑01 challenges, you need the domain’s HTTP path to reach your validation endpoint. In multi‑tenant setups, the timing doesn’t always line up. The customer might not have updated their A or CNAME yet, or maybe your edge hasn’t been configured to serve the challenge for a domain you don’t “own” yet. It becomes a little chicken‑and‑egg problem.
DNS‑01 flips that script. Instead of proving control over HTTP, you drop a TXT record at _acme‑challenge.customer‑domain.com with a token the certificate authority expects. If the token matches and propagates, you’re good. That means you can complete validation before traffic ever touches your servers. When the cutover happens, the cert is already there, warm and waiting.
Wildcards and the sweet spot
Another quiet win with DNS‑01: wildcards. If your tenants use patterns like *.brand.yourapp.com, a wildcard certificate covers all subdomains in one go. In my experience, wildcards reduce churn in certificate issuance, smooth out spikes, and save you from rate‑limit headaches during rush periods. They also make pre‑provisioning a breeze when you onboard a batch of tenants that share a subdomain convention.
But even when tenants bring a fully custom domain, DNS‑01 still shines because you can validate without steering their traffic first. I’ve watched launch days where teams ran ads and influencers hard at a specific hour, while DNS‑01 quietly handled certificate issuance in the background. No adrenaline spikes. No emergency huddles. Just smooth.
The DNS‑01 ACME Mental Model (Explained Like We’re Sketching on a Napkin)
What actually happens during validation
Here’s the simple flow: your system starts an ACME order with a certificate authority, gets a challenge token, and then writes a TXT record at _acme‑challenge.example.com with that token. The CA queries DNS, sees the token, and says “yep, that’s yours.” You finalize the order, get the cert, and your edge starts using it. That’s it—at least on paper.
In production, the secret sauce is automation. You want a tiny, reliable bot that can create and clean up TXT records for many DNS providers, wait sensibly for propagation, and handle retries like a grown‑up. I like to think of it as a patient librarian: it files the right card in the right drawer, checks back politely, and doesn’t make a fuss when the shelves are busy.
APIs, propagation, and time
Most popular DNS providers offer APIs these days, so the automation piece is entirely doable. Cloudflare, Route 53, Gandi, and others let you create TXT records with scoped tokens that don’t expose the whole account. If you’re new to this, start by studying the DNS‑01 challenge flow in the CA’s docs—this primer from Let’s Encrypt is a friendly place to get the gist: how the DNS‑01 challenge works under the hood.
Two practical notes from the trenches: first, give yourself realistic timeouts. DNS can be fast, but it’s not guaranteed. Even when APIs return “success,” resolvers might need a minute to catch up. Second, be tidy. Clean up TXT records after issuance to avoid confusing future validations or clutter that masks real problems.
DNSSEC, CAA, and sharp edges
Security‑minded teams love DNSSEC and CAA, and for good reason. But they introduce guardrails that you must respect during automation. If a tenant enables DNSSEC with broken glue or stale DS records, TXT validation queries might fail in ways that look like random timeouts. CAA records can block issuance if the certificate authority you’re using isn’t explicitly allowed. If that sounds scary, don’t worry—once you know it, you’ll plan for it. And if you want a calm, step‑by‑step mindset around DNS keys, I’ve written about graceful rollover in Zero‑Downtime DNSSEC: The Friendly Guide to KSK/ZSK Rollover and DS Updates Without Breaking the Internet. The same low‑drama approach applies here: small changes, measured checks, nothing frantic.
The Architecture That Scales: Orchestrators, Adapters, and a Calm Event Loop
One brain, many hands
When Auto‑SSL needs to cover thousands of tenant domains, I like a pattern that looks like this: an orchestrator schedules certificate orders, a queue gives you backpressure control, and a group of stateless workers performs the ACME steps. Each worker talks to a DNS adapter—the tiny piece that knows how to create TXT records for a specific provider—then calls the ACME endpoint, waits, finalizes, and stores the result. Think of the orchestrator as the conductor and the DNS adapters as instruments. The music sounds good when they’re in sync.
Persist your certificate and key material in a secure store that’s easy to distribute. Some teams like a database with encryption at rest; others prefer a secrets manager and a lightweight CDN‑like layer for the public certs. The edge—Nginx, Envoy, HAProxy, Traefik—should be able to reload certificates without dropping connections. A quick, graceful reload beats long‑lived hot‑patching in most stacks.
ECDSA and RSA together
One practical detail that earns you smiles: serve both ECDSA and RSA certificates when your edge supports it. You get modern performance and broad compatibility at the same time. If you’re curious how to wire that up in popular servers, here’s a friendly walkthrough: Serving Dual ECDSA + RSA Certificates on Nginx and Apache. It feels like a tiny superpower—you don’t have to choose between speed and reach.
Apex records, CNAMEs, and the real world
Customers love pointing the apex of their domain to you—no “www,” no extra letters. But apex records can’t be plain CNAMEs. This is where ALIAS/ANAME records or CNAME flattening at the DNS provider save the day. If you haven’t run into that yet, it’s worth getting ahead of the curve with this gentle guide: CNAME at the Apex? The Friendly Guide to ALIAS/ANAME and Cloudflare CNAME Flattening. I like to surface this early in the onboarding UI so customers don’t get stuck on a purely DNS limitation that looks like a bug in your product.
Rate Limits, Spiky Traffic, and Keeping Calm When Everyone Onboards at Once
Why bulk issuance causes surprise headaches
One of my clients launched a partner program and onboarded hundreds of new domains in a weekend. The product did everything right—DNS‑01 tokens, clean propagation checks, safe storage—but they ran into rate limits at the certificate authority because so many domains hit the system in a tight window. It wasn’t catastrophic, but it was noisy and stressful.
There’s a calmer way. Spread out issuance with a queue that respects per‑minute and per‑hour velocity. Pre‑issue where you can: if you know a campaign date, ask tenants to connect domains a day earlier and quietly provision overnight. Wildcards help when your naming is predictable—one certificate covers many subdomains. And if you’re thinking “what about SAN certificates,” or you just want a playbook that reads like advice from a friend, this piece pairs nicely with today’s topic: Dodging the Wall: How I Avoid Let’s Encrypt Rate Limits with SANs, Wildcards, and Calm ACME Automation.
Delegation tricks that make life easier
If a tenant’s DNS provider has no API you can use directly, or if they just don’t want to hand you tokens, there’s a neat trick: delegate the challenge to a subdomain you control. For example, you ask them to create a CNAME from _acme‑challenge.customer‑domain.com to something like _acme‑challenge.customer‑domain.com.acme.yourapp.net. You own acme.yourapp.net and can update TXT records there at will. Clean, safe, and it avoids juggling yet another provider’s credentials.
If you’re on Kubernetes and prefer to build with battle‑tested blocks, cert‑manager’s documentation has a ton of practical examples for DNS‑01 with various providers. Even if you’re not running K8s, it’s a helpful way to see the moving pieces without getting lost in theory.
Security Boundaries You Can Sleep On
Token hygiene and tenant isolation
When automation touches DNS, treat tokens like you would production database credentials. Scope them narrowly. Ideally, the token can only write TXT records under _acme‑challenge for specific zones. Don’t give the adapter “delete anything” power unless you absolutely must. Segment tokens per tenant or per provider account for clean blast radiuses. And log every write and cleanup—future‑you will thank present‑you when debugging a weird edge case.
For SaaS with bring‑your‑own‑DNS, I try to avoid collecting a tenant’s registrar or DNS credentials at all. The CNAME‑based delegation pattern means they keep full control, while you retain the agility to automate issuance. If you do accept tokens, make expiration your default and rotation a routine. It’s not a ceremony—more like a calendar reminder you don’t skip.
CAA and explicit trust
CAA records define which certificate authorities can issue for a domain. Many enterprises set them, and some set them tightly. Your onboarding flow should detect blocking CAA policies and explain next steps without alarm. A concise message like “We couldn’t issue your certificate because your domain restricts certificate authorities. Please allow [Your CA] in CAA, or contact your DNS admin” turns a potential panic into a quick fix. Once teams see that you understand their controls, trust goes up.
Edge Routing, Zero‑Downtime Issuance, and a Nice Cup of Hot Reload
How to move fast without dropping connections
Once a certificate is issued, you want your edge to pick it up fast. Some stacks watch a directory and reload automatically; others need a gentle signal. Either way, favor approaches that are graceful. A fast reload that drains old workers and warms new ones keeps latency steady. If you bundle certs into a configuration template, resist the urge to re‑emit the entire config on every tiny change. Small deltas lead to fewer surprises.
In my own setups, I’ve had good luck separating the act of issuance from the act of serving. Issuers write into a store; edges subscribe and cache. This small separation buys you resilience when the CA is having a slow day, or when a provider API is hiccuping. The edge keeps serving cleanly while the back office sorts itself out.
When HTTP‑01 still has a place
Even when you lean on DNS‑01, there are cases where HTTP‑01 is handy for internal domains you fully control. I treat it like a local wrench: great for first‑party subdomains where routing is already yours, less great for random external tenant domains. If you want a deeper dive into the protocol itself, the formal spec is a good reference point: the ACME protocol (RFC 8555). Keep it nearby, but don’t feel like you have to quote it to build something robust.
Real‑World Gotchas I Still Watch For
Propagation illusions
When a DNS API returns “created,” it’s easy to assume the world sees the record. But resolvers cache, and authoritative nameservers sometimes lag. I build in a short retry loop with gentle backoff, and I verify with multiple vantage points. The goal isn’t to hammer; it’s to be patient in a predictable way. And yes, TTL still matters. Short TTLs during onboarding help, as long as you don’t keep everything short forever and pay a penalty you don’t need.
CAA and wildcard surprises
Wildcard certificates add their own tiny quirks. Some organizations set CAA with wildcards in mind, others forget and only permit issuance for the apex. You’ll catch this quickly if you check CAA as part of your readiness tests. Make that check friendlier than a red stop sign—make it a guidepost with clear steps and language non‑experts can follow.
DNSSEC misconfigurations that look random
I’ve lost an afternoon once to a single stale DS record after a registrar change. Everything looked right until DNSSEC validation stepped in and failed at the worst possible place. A quick sanity check on DS and key state up front saves a ridiculous amount of time. If you haven’t rolled keys in a while, refresh your muscle memory with the zero‑drama path I mentioned earlier—those habits carry over directly.
Customer apex and the “CNAME at root” myth
I still see onboarding forms that say “Please point a CNAME at your root domain.” That instruction will confuse people, because it’s not how the DNS standard works. Instead, point them to ALIAS/ANAME or a provider with CNAME flattening, and have your UI present the right recommendation based on their provider. Even a simple note like “If your provider supports ALIAS/ANAME, use that. Otherwise, here’s how flattening works” defuses most of the friction.
Stories from the Field: The Weekend We Issued for 1,200 Domains
What went right (and what I’d do differently)
A few years back, a client asked me to help prepare for a weekend launch. They had hundreds of franchise sites, each with a custom domain, all going live within a 48‑hour window. We built a DNS‑01 adapter for the two most common providers, shipped a fallback flow using CNAME delegation, and added a queue that paced issuance below CA thresholds. The results were almost boring. Most certificates issued within a few minutes of DNS updates. The edge reloaded calmly. The team actually left for dinner.
But not everything was perfect. We ran into a cluster of failures with CAA policies no one knew existed. A handful of DS records were stale after a registrar migration six months earlier. And we discovered that one provider’s API confirmed TXT creation before the nameservers actually served it. None of these were disasters. They were clarifying. We added an up‑front readiness check for CAA, a DS sanity scan, and a propagation verify with a slower backoff for that specific provider. Monday morning came with fewer sticky notes and much happier sleep.
Putting It Together: A Calm Implementation Checklist (No Panic Required)
How I’d build it again tomorrow
If I were starting fresh, I’d sketch the system like this. First, an issuance orchestrator with a queue and a clear state machine: requested, challenge‑created, awaiting‑propagation, validated, finalized, distributed. Second, a set of DNS provider adapters behind a clean interface. Third, a tiny library that waits for TXT records to show up from multiple resolvers before calling validation. Fourth, a storage layer that version‑controls key and certificate pairs. Finally, an edge that reloads gracefully and supports dual‑stack certificates so you get modern speed without leaving older clients behind.
For the developer experience, I’d add a self‑service UI that offers two paths: “One‑click if your DNS is here” and “Guided CNAME delegation if you’re somewhere else.” I’d also add a little link to a plain‑English explainer on apex routing—something like the piece I shared earlier—and I’d keep a human in the loop via a small inbox where tenants can say “stuck at step 2” in their own words. That little human touch saves your team hours.
If you’re a Kubernetes shop, it’s absolutely fine to lean on existing building blocks. cert‑manager is mature, and its docs are helpful in figuring out the flux between ACME, DNS providers, and your ingress. Start with one or two providers and grow adapters as your tenant base demands it. On the security side, give tokens the least power needed, rotate them on a schedule, and log all writes. If your provider supports fine‑grained API permissions, this page is worth bookmarking: creating scoped API tokens on Cloudflare.
Wrap‑Up: Make SSL the Least Interesting Part of Your SaaS
If there’s one mindset shift I’ve learned, it’s this: certificates should be a background process, not a recurring event. With DNS‑01 ACME, you can validate ownership before traffic moves, cut out awkward manual steps, and handle custom domains at scale without raising anyone’s blood pressure. A small orchestrator, a few clean adapters, careful token hygiene, and an edge that reloads gracefully—those pieces add up to a system that just works.
Start small. Pick one provider and automate the TXT dance end‑to‑end. Add propagation checks that wait patiently, not anxiously. Keep an eye on CAA and DNSSEC before they become late‑night mysteries. And pace your issuance so rate limits are a non‑issue. If you do that, launch days stop feeling like cliff dives and start feeling like quiet walks. Hope this was helpful! If you want to keep the calm vibe going, take a look at that apex guidance and the ECDSA+RSA setup I linked above. See you in the next post—may your TXT records propagate quickly and your certs always be fresh.
