{"id":1899,"date":"2025-11-15T23:33:47","date_gmt":"2025-11-15T20:33:47","guid":{"rendered":"https:\/\/www.dchost.com\/blog\/i-built-a-3%e2%80%91vps-ha-k3s-cluster-with-traefik-cert%e2%80%91manager-and-longhorn-heres-the-playbook\/"},"modified":"2025-11-15T23:33:47","modified_gmt":"2025-11-15T20:33:47","slug":"i-built-a-3%e2%80%91vps-ha-k3s-cluster-with-traefik-cert%e2%80%91manager-and-longhorn-heres-the-playbook","status":"publish","type":"post","link":"https:\/\/www.dchost.com\/blog\/en\/i-built-a-3%e2%80%91vps-ha-k3s-cluster-with-traefik-cert%e2%80%91manager-and-longhorn-heres-the-playbook\/","title":{"rendered":"I Built a 3\u2011VPS HA K3s Cluster With Traefik, cert\u2011manager, and Longhorn \u2014 Here\u2019s the Playbook"},"content":{"rendered":"<div class=\"dchost-blog-content-wrapper\"><\/p>\n<div id=\"toc_container\" class=\"toc_transparent no_bullets\"><p class=\"toc_title\">\u0130&ccedil;indekiler<\/p><ul class=\"toc_list\"><li><a href=\"#The_moment_I_knew_it_was_time_to_grow_up_my_cluster\"><span class=\"toc_number toc_depth_1\">1<\/span> The moment I knew it was time to grow up my cluster<\/a><\/li><li><a href=\"#Why_three_VPS_nodes_make_everything_feel_calmer\"><span class=\"toc_number toc_depth_1\">2<\/span> Why three VPS nodes make everything feel calmer<\/a><\/li><li><a href=\"#The_plan_clean_base_simple_network_tight_doors\"><span class=\"toc_number toc_depth_1\">3<\/span> The plan: clean base, simple network, tight doors<\/a><\/li><li><a href=\"#K3s_in_HA_mode_with_embedded_etcd_the_easy-on-the-brain_setup\"><span class=\"toc_number toc_depth_1\">4<\/span> K3s in HA mode with embedded etcd (the easy-on-the-brain setup)<\/a><\/li><li><a href=\"#Traefik_as_your_front_door_without_overthinking_it\"><span class=\"toc_number toc_depth_1\">5<\/span> Traefik as your front door (without overthinking it)<\/a><\/li><li><a href=\"#Certificates_that_renew_themselves_certmanager_is_worth_the_small_learning_curve\"><span class=\"toc_number toc_depth_1\">6<\/span> Certificates that renew themselves (cert\u2011manager is worth the small learning curve)<\/a><\/li><li><a href=\"#State_that_survives_reboots_Longhorn_the_surprisingly_friendly_storage_layer\"><span class=\"toc_number toc_depth_1\">7<\/span> State that survives reboots (Longhorn, the surprisingly friendly storage layer)<\/a><\/li><li><a href=\"#Exposing_the_cluster_cleanly_DNS_IPv6_and_a_steady_entrypoint\"><span class=\"toc_number toc_depth_1\">8<\/span> Exposing the cluster cleanly: DNS, IPv6, and a steady entrypoint<\/a><\/li><li><a href=\"#Day_2_reality_health_upgrades_and_the_quiet_guardrails_that_save_you\"><span class=\"toc_number toc_depth_1\">9<\/span> Day 2 reality: health, upgrades, and the quiet guardrails that save you<\/a><\/li><li><a href=\"#Networking_notes_youll_thank_yourself_for_later\"><span class=\"toc_number toc_depth_1\">10<\/span> Networking notes you\u2019ll thank yourself for later<\/a><\/li><li><a href=\"#Common_gotchas_and_how_I_learned_to_dodge_them\"><span class=\"toc_number toc_depth_1\">11<\/span> Common gotchas (and how I learned to dodge them)<\/a><\/li><li><a href=\"#Your_step-by-step_no-drama_install_checklist\"><span class=\"toc_number toc_depth_1\">12<\/span> Your step-by-step, no-drama install checklist<\/a><\/li><li><a href=\"#Tuning_the_last_mile_readiness_topology_and_small_luxuries\"><span class=\"toc_number toc_depth_1\">13<\/span> Tuning the last mile: readiness, topology, and small luxuries<\/a><\/li><li><a href=\"#What_about_growth_When_three_nodes_arent_enough_anymore\"><span class=\"toc_number toc_depth_1\">14<\/span> What about growth? When three nodes aren\u2019t enough anymore<\/a><\/li><li><a href=\"#A_few_words_on_confidence_and_calm_operations\"><span class=\"toc_number toc_depth_1\">15<\/span> A few words on confidence and calm operations<\/a><\/li><li><a href=\"#Wrap-up_your_3VPS_HA_K3s_cluster_quietly_dependable\"><span class=\"toc_number toc_depth_1\">16<\/span> Wrap-up: your 3\u2011VPS HA K3s cluster, quietly dependable<\/a><\/li><\/ul><\/div>\n<h2 id=\"section-1\"><span id=\"The_moment_I_knew_it_was_time_to_grow_up_my_cluster\">The moment I knew it was time to grow up my cluster<\/span><\/h2>\n<p>It started with a very quiet Tuesday. One of those days when everything feels calm\u2026 until it doesn\u2019t. I was sipping coffee, poking at some logs, when a simple kernel update rebooted my single <a href=\"https:\/\/www.dchost.com\/vps\">VPS<\/a> and my tiny Kubernetes playground vanished for eight long minutes. No alarms, no pagers, just the slow-motion realization that my \u201cgood enough\u201d setup wasn\u2019t actually good enough. Clients noticed. I noticed. And that was the day I promised myself I\u2019d stop treating production like a side project.<\/p>\n<p>If you\u2019re here, chances are you\u2019ve had that moment too. Maybe your app is getting traction. Maybe you\u2019ve got a couple of microservices, a database that shouldn\u2019t disappear, and users who expect your domain to behave like a grown-up. So let\u2019s build a grown-up platform\u2014without the drama. I\u2019m going to walk you through a production-ready, three-VPS, high-availability K3s cluster with Traefik as ingress, cert\u2011manager for automated TLS, and Longhorn for persistent storage. We\u2019ll talk architecture, installation, real-world gotchas, and the calm way to run the thing day to day.<\/p>\n<p>By the end, you\u2019ll have a clear mental model, a practical plan, and the confidence to ship on a cluster that doesn\u2019t blink just because a single VM sneezed.<\/p>\n<h2 id=\"section-2\"><span id=\"Why_three_VPS_nodes_make_everything_feel_calmer\">Why three VPS nodes make everything feel calmer<\/span><\/h2>\n<p>Think of a three-VPS cluster like a three-legged stool. Two legs can wobble. Four legs are nice, but sometimes you don\u2019t have room. Three legs? You can sit down and exhale. That\u2019s quorum\u2014two nodes can disagree, but the third breaks ties and keeps the cluster consistent. In K3s land, that \u201cbrain\u201d is etcd. When we run K3s in HA mode with embedded etcd, each node carries a piece of the truth. Lose one node? You can still write to the cluster, deploy workloads, renew certificates, the whole show.<\/p>\n<p>Here\u2019s the mental picture that clicked for me: one public domain pointing to a stable entrypoint (we\u2019ll talk about how to make that truly stable), three small-but-capable VPS instances (2\u20134 vCPU and 4\u20138 GB RAM each is a nice starting point), and a cluster that knows how to keep going if any single box goes dark. K3s gives you the lightweight control plane. Traefik takes incoming traffic and routes it politely. cert\u2011manager keeps the locks on the doors with auto-renewing certificates. Longhorn spreads your persistent volumes across nodes, so a single outage doesn\u2019t take your data with it.<\/p>\n<p>It\u2019s also worth mentioning that this stack doesn\u2019t demand a hyperscaler. I\u2019ve run it on modest providers and it just hums. The trick is to be intentional about networking, storage prerequisites, and small-but-important guardrails like PodDisruptionBudgets and node taints. We\u2019ll cover those as we go.<\/p>\n<h2 id=\"section-3\"><span id=\"The_plan_clean_base_simple_network_tight_doors\">The plan: clean base, simple network, tight doors<\/span><\/h2>\n<p>Before we type a single install command, a few groundwork pieces make life much easier. In my experience, there are three that matter most: a clean OS baseline, a simple private network between nodes, and a firewall stance that defaults to \u201cnope\u201d from the internet and \u201ceverything you need\u201d on your private mesh.<\/p>\n<p>On the OS front, I like to start from a minimal image and harden it gently. Nothing fancy, just the basics: patching, SSH keys only, a non-root user with sudo, and a handful of sensible defaults. If you want a calm, practical walkthrough that pairs nicely with what we\u2019re building, I wrote about this in <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-sunucu-guvenligi-nasil-saglanir-kapiyi-acik-birakmadan-yasamanin-sirri\/\">The Calm, No\u2011Drama Guide: How to Secure a VPS Server<\/a>. Grab that mindset and keep it handy.<\/p>\n<p>For networking, the easiest path is giving your three nodes a private way to talk\u2014either the provider\u2019s internal network or a tiny WireGuard mesh you control. K3s uses an internal CNI (Flannel by default) to route pod traffic, but you still want node-to-node transport that\u2019s reliable. I tend to allow \u201cany-to-any\u201d on the private interface and lock down the public interface to just what Traefik, SSH, and the K3s API need (usually 80\/443 for HTTP\/HTTPS, 22 for SSH, and 6443 if you\u2019ll be managing the cluster remotely). Longhorn replicates blocks between nodes, so that private path puts the heavy lifting off your public NICs and out of sight.<\/p>\n<p>One more thing about the entrypoint. On three VPS instances, you might not have a cloud Load Balancer. That\u2019s okay. I\u2019ve had good results with a small floating IP managed by keepalived, or simply pointing DNS at a single \u201cactive\u201d node with health-checked failover that flips quickly if it goes down. If you want to sleep particularly well during migrations, I shared how I run resilient DNS in <a href=\"https:\/\/www.dchost.com\/blog\/en\/coklu-saglayici-dns-nasil-kurulur-octodns-ile-zero%e2%80%91downtime-gecis-ve-dayaniklilik-rehberi\/\">How I Run Multi\u2011Provider DNS with octoDNS<\/a>. That approach plays beautifully with a K3s ingress endpoint.<\/p>\n<h2 id=\"section-4\"><span id=\"K3s_in_HA_mode_with_embedded_etcd_the_easy-on-the-brain_setup\">K3s in HA mode with embedded etcd (the easy-on-the-brain setup)<\/span><\/h2>\n<p>Now for the fun part: making the cluster. The K3s team has an excellent guide for this setup\u2014if you like reading the source, check out <a href=\"https:\/\/docs.k3s.io\/installation\/ha-embedded-etcd\" rel=\"nofollow noopener\" target=\"_blank\">the official K3s HA with embedded etcd guide<\/a>. The flow is surprisingly simple. You\u2019ll install the first node as a server with embedded etcd, grab a token, and bring up the other two nodes as peers.<\/p>\n<p>On the first node, something like this gets you going:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">curl -sfL https:\/\/get.k3s.io | sh -s - \n  server \n  --cluster-init \n  --write-kubeconfig-mode 644 \n  --disable servicelb \n  --disable local-storage\n<\/code><\/pre>\n<p>Why disable those two? K3s ships a tiny ServiceLB and local storage provisioner. For a production-ish cluster, I prefer Traefik plus Longhorn, so I turn the bundled bits off. After the first node stabilizes, get your cluster join token:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">sudo cat \/var\/lib\/rancher\/k3s\/server\/node-token<\/code><\/pre>\n<p>On the second and third nodes, join as servers (peers) so you have three control-plane nodes sharing etcd:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">curl -sfL https:\/\/get.k3s.io | K3S_URL=https:\/\/&lt;FIRST-NODE-IP&gt;:6443 \n  K3S_TOKEN=&lt;THE-TOKEN-YOU-JUST-COPIED&gt; \n  sh -s - server \n  --write-kubeconfig-mode 644 \n  --disable servicelb \n  --disable local-storage\n<\/code><\/pre>\n<p>Give it a minute or two and check from any node:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">sudo k3s kubectl get nodes -o wide<\/code><\/pre>\n<p>You should see three \u201cReady\u201d nodes, each marked as control-plane. Don\u2019t be surprised if taints appear on control-plane nodes by default. K3s is flexible about running workloads on them, but for a tidier production feel, I like to keep system components and light workloads there and push heavier apps onto explicit worker nodes when I have them. With three VPS nodes only, you can absolutely run your apps on these nodes\u2014just set resource requests and PodDisruptionBudgets thoughtfully so upgrades don\u2019t juggle everything at once.<\/p>\n<p>One last thing: snapshots. K3s can periodically snapshot etcd for you. If you keep snapshots local and copy them to object storage, you\u2019ve suddenly got a realistic recovery path. It\u2019s not glamorous, but it\u2019s the difference between an \u201coops\u201d and a rebuild.<\/p>\n<h2 id=\"section-5\"><span id=\"Traefik_as_your_front_door_without_overthinking_it\">Traefik as your front door (without overthinking it)<\/span><\/h2>\n<p>K3s often includes Traefik by default, but depending on your version and flags, you might be installing it yourself. Either way, I like Traefik because it\u2019s easy to reason about. It handles HTTP\/HTTPS, it plays nicely with standard Kubernetes Ingress objects, and it respects annotations for most things you\u2019d want\u2014timeouts, headers, middlewares\u2014without a lot of YAML yoga.<\/p>\n<p>The philosophy here is to keep Ingress definitions boring and predictable. Something like this (as an example) is clean:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">apiVersion: networking.k8s.io\/v1\nkind: Ingress\nmetadata:\n  name: hello\n  namespace: default\n  annotations:\n    kubernetes.io\/ingress.class: traefik\nspec:\n  rules:\n  - host: hello.example.com\n    http:\n      paths:\n      - path: \/\n        pathType: Prefix\n        backend:\n          service:\n            name: hello-svc\n            port:\n              number: 80\n  tls:\n  - hosts:\n    - hello.example.com\n    secretName: hello-tls\n<\/code><\/pre>\n<p>We\u2019ll let cert\u2011manager manage the TLS secret, and Traefik will serve it. For environments where you need to route multiple hosts, apply HSTS, or sneak in basic auth for an internal tool, Traefik\u2019s middleware stack is straightforward. Start simple, then iterate.<\/p>\n<p>What about the external IP? On three VPS instances without a managed LB, I\u2019ve used a few strategies: a floating IP with keepalived that follows a healthy node; DNS failover that points to whichever node is currently \u201cactive\u201d; or, if your provider allows it, a small, single-node MetalLB IP pool. The first two options are usually plenty, and they keep your design compact.<\/p>\n<h2 id=\"section-6\"><span id=\"Certificates_that_renew_themselves_certmanager_is_worth_the_small_learning_curve\">Certificates that renew themselves (cert\u2011manager is worth the small learning curve)<\/span><\/h2>\n<p>If you\u2019ve ever renewed a certificate at 2 a.m., cert\u2011manager will feel like magic. It watches your Ingress hosts and renews secrets automatically using ACME. The most flexible approach is DNS\u201101, which lets you issue wildcards like *.example.com without worrying about HTTP challenges. The official docs are clear and approachable\u2014bookmark the <a href=\"https:\/\/cert-manager.io\/docs\/\" rel=\"nofollow noopener\" target=\"_blank\">cert\u2011manager installation guide<\/a>.<\/p>\n<p>The flow I use is: install cert\u2011manager via Helm or YAML, create a ClusterIssuer with DNS credentials to your DNS provider, and annotate Ingress resources to request certificates. If you\u2019re running a SaaS or multi-tenant system and want to scale auto-SSL across customer domains, I wrote a friendly deep dive in <a href=\"https:\/\/www.dchost.com\/blog\/en\/saaste-ozel-alan-adlari-ve-otomatik-ssl-dns%e2%80%9101-ile-cok-kiracili-mimarini-nasil-tatli-tatli-olceklersin\/\">Bring Your Own Domain, Get Auto\u2011SSL: DNS\u201101 ACME<\/a>. The same principles apply beautifully in Kubernetes.<\/p>\n<p>A minimal ClusterIssuer for DNS\u201101 might look like this (Cloudflare as an example):<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">apiVersion: cert-manager.io\/v1\nkind: ClusterIssuer\nmetadata:\n  name: letsencrypt-dns\nspec:\n  acme:\n    email: you@example.com\n    server: https:\/\/acme-v02.api.letsencrypt.org\/directory\n    privateKeySecretRef:\n      name: acme-account-key\n    solvers:\n    - dns01:\n        cloudflare:\n          apiTokenSecretRef:\n            name: cloudflare-api-token\n            key: token\n<\/code><\/pre>\n<p>Then you annotate your Ingress or specify tls.secretName and a Certificate resource, and cert\u2011manager takes it from there. If you want to go deeper on operational resilience\u2014fallback CAs, rate-limit strategies, and how CAA records interact with ACME automation\u2014pair this with understanding multi-CA approaches and DNS discipline. My notes on running robust, real\u2011world DNS and ACME setups in that earlier article will save you hours when traffic grows.<\/p>\n<h2 id=\"section-7\"><span id=\"State_that_survives_reboots_Longhorn_the_surprisingly_friendly_storage_layer\">State that survives reboots (Longhorn, the surprisingly friendly storage layer)<\/span><\/h2>\n<p>When I first adopted Longhorn, I expected grief. Distributed block storage always sounds like a headache. But here\u2019s the thing: for a three-node K3s cluster, Longhorn is the right kind of boring. It handles replica scheduling, rebuilds after failures, gives you a simple UI for visibility, and integrates with PersistentVolumeClaims like any good CSI driver should. Start with the <a href=\"https:\/\/longhorn.io\/docs\/\" rel=\"nofollow noopener\" target=\"_blank\">Longhorn documentation<\/a> for install steps and prerequisites.<\/p>\n<p>There are a few must-dos that make Longhorn hum. First, install open-iscsi on each node and make sure it starts at boot. Longhorn uses iSCSI for attaching volumes to pods. Second, give the nodes enough local disk\u2014SSD if you can swing it\u2014because replica writes still hit local storage before being replicated. Third, set a replica count of two for most workloads in a three-node cluster. It\u2019s the sweet spot between resilience and resource usage. Losing a node still leaves you with two replicas to continue serving data and a rebuild path when the node returns.<\/p>\n<p>Here\u2019s a simple StorageClass you can use as a default once Longhorn is installed:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">apiVersion: storage.k8s.io\/v1\nkind: StorageClass\nmetadata:\n  name: longhorn-default\nannotations:\n  storageclass.kubernetes.io\/is-default-class: &quot;true&quot;\nprovisioner: driver.longhorn.io\nparameters:\n  numberOfReplicas: &quot;2&quot;\n  staleReplicaTimeout: &quot;30&quot;\nreclaimPolicy: Delete\nallowVolumeExpansion: true\nvolumeBindingMode: Immediate\n<\/code><\/pre>\n<p>With that in place, you can declare PersistentVolumeClaims in your apps and not think about it too hard. Longhorn will place replicas on different nodes and figure out attachment automatically. There\u2019s a dashboard too. I don\u2019t live there, but when something smells off (like a node with a flaky disk), it\u2019s nice to have a human-friendly view.<\/p>\n<p>One more pro-tip: snapshots and backups. Longhorn can snapshot locally and back up to S3-compatible storage. If your database matters even a little, set that up early. I\u2019ve had a few \u201cwhew\u201d moments thanks to those backups when a schema migration went sideways during a late-night deploy.<\/p>\n<h2 id=\"section-8\"><span id=\"Exposing_the_cluster_cleanly_DNS_IPv6_and_a_steady_entrypoint\">Exposing the cluster cleanly: DNS, IPv6, and a steady entrypoint<\/span><\/h2>\n<p>Let\u2019s talk addresses. You\u2019ll likely have a single hostname fronting your cluster, with Traefik presenting TLS there and routing traffic internally. Make that hostname a first-class citizen in your DNS. If your provider offers a floating IP, use keepalived to move it between nodes when you need to. If not, lean on DNS health checks to flip quickly when a node goes silent.<\/p>\n<p>Also, don\u2019t sleep on IPv6. Many providers now give each VPS a v6 address for free, and users on modern networks will reach you that way. I wrote the story of how I made v6 adoption painless in <a href=\"https:\/\/www.dchost.com\/blog\/en\/ipv6-benimseme-hizlanmasi-neden-simdi-nasil-tatli-tatli-olur\/\">The Calm Sprint to IPv6<\/a>. The gist: enable it where you can, terminate TLS properly, and let dual-stack work for you rather than against you.<\/p>\n<p>For a cluster entrypoint pattern I like: A records to your active node\u2019s IPv4 and IPv6, DNS health checks that switch to a backup node if needed, and Traefik listening on 80\/443 with redirects to HTTPS. It\u2019s not fancy. It\u2019s not fragile. It just works. And when combined with <a href=\"https:\/\/www.dchost.com\/blog\/en\/coklu-saglayici-dns-nasil-kurulur-octodns-ile-zero%e2%80%91downtime-gecis-ve-dayaniklilik-rehberi\/\">multi\u2011provider DNS that you can migrate without breaking a sweat<\/a>, you get resilience that scales with you.<\/p>\n<h2 id=\"section-9\"><span id=\"Day_2_reality_health_upgrades_and_the_quiet_guardrails_that_save_you\">Day 2 reality: health, upgrades, and the quiet guardrails that save you<\/span><\/h2>\n<p>This is the part most guides skip, but it\u2019s where clusters either feel gentle or chaotic. Start by giving Kubernetes the hints it needs to treat your apps kindly. Resource requests keep the scheduler honest. Liveness and readiness probes tell it when to stop sending traffic. PodDisruptionBudgets ensure rollouts and node drains don\u2019t take all replicas down at once. I usually start with a PDB that allows one replica to be down and set deployment replicas to at least two. That alone prevents a whole category of self-inflicted outages.<\/p>\n<p>Upgrades are straightforward once you respect the rhythm. For K3s, drain one node at a time, upgrade, uncordon, watch it rejoin, then move on. The embedded etcd spreads the risk; you\u2019re never touching quorum by upgrading a single node. Traefik rolls cleanly. cert\u2011manager barely notices. Longhorn will detach and reattach volumes as needed, though on very write-heavy workloads I like to pause briefly during the switchover.<\/p>\n<p>Monitoring and logs don\u2019t have to be a project either. Even a basic Prometheus + Grafana stack and a simple log pipeline give you eyes. Watch node pressure (CPU, memory, disk), watch Traefik 5xxs, and keep an eye on Longhorn\u2019s replica health. That\u2019s 90% of the \u201cis it happy?\u201d question answered.<\/p>\n<h2 id=\"section-10\"><span id=\"Networking_notes_youll_thank_yourself_for_later\">Networking notes you\u2019ll thank yourself for later<\/span><\/h2>\n<p>Flannel\u2019s default VXLAN works fine for most three-node clusters. If you crave advanced policy or eBPF toys, you can explore other CNIs, but don\u2019t feel pressured. What matters is making sure node-to-node traffic is unhindered on your private network. The ports change depending on the CNI; the simplest approach is allowing all on that private interface and guarding the public side tightly. If you\u2019ll connect kubectl from your laptop, expose the API at 6443 on your entrypoint and restrict it with security groups or your firewall to your IP ranges.<\/p>\n<p>One more gentle nudge: tune your basics. TCP backlog, sensible time_wait, and reuse settings, and a comfortable file descriptor limit prevent head-scratching under load. If you\u2019re curious how to keep that tuning pragmatic and safe, you might enjoy my notes in <a href=\"https:\/\/www.dchost.com\/blog\/en\/yuksek-trafikli-wordpress-laravelde-linux-tcp-tuning-sysctl-ayarlari-udp-bufferlari-ve-syn-flooda-karsi-sakin-kalmak\/\">The Calm Guide to Linux TCP Tuning for High\u2011Traffic Apps<\/a>. Those little tweaks often matter more than exotic Kubernetes flags.<\/p>\n<h2 id=\"section-11\"><span id=\"Common_gotchas_and_how_I_learned_to_dodge_them\">Common gotchas (and how I learned to dodge them)<\/span><\/h2>\n<p>I\u2019ll never forget the first time Longhorn refused to attach a volume because open-iscsi wasn\u2019t running on one node. It felt mysterious until I remembered the prerequisite. Double-check that service is enabled and healthy on every VPS. Another classic: cert\u2011manager stuck waiting on a DNS challenge because I mis-scoped an API token. The fix was simply giving the token permission to edit TXT records in the right zone. When in doubt, watch the cert\u2011manager and challenge logs; they\u2019re chatty in a helpful way.<\/p>\n<p>Traefik timeouts can also bite. If a service sits behind a slow upstream (say, a database query that sometimes spikes), it\u2019s okay to bump your timeout annotations a bit. Just don\u2019t hide real performance issues behind huge timeouts. Keep an eye on your upstream services and let autoscaling or better queries do the heavy lifting.<\/p>\n<p>And the one we all step on once: draining a node without a PodDisruptionBudget on your single-replica stateful app. Kubernetes will do exactly what you asked\u2014evict the only pod\u2014and your users will watch an hourglass. Make your future self proud and add a PDB early, even if it\u2019s conservative. It pays you back the first time you patch a kernel without holding your breath.<\/p>\n<h2 id=\"section-12\"><span id=\"Your_step-by-step_no-drama_install_checklist\">Your step-by-step, no-drama install checklist<\/span><\/h2>\n<p>Here\u2019s how I tee this up in practice, keeping it simple and repeatable:<\/p>\n<p>First, prep the VPSs: patch, set SSH keys, create a non-root user, and lock down inbound firewall rules. Give each node a private interface or a WireGuard mesh so they can talk freely out of the public eye. If this part makes you nervous, lean on the mindset in <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-sunucu-guvenligi-nasil-saglanir-kapiyi-acik-birakmadan-yasamanin-sirri\/\">my calm VPS security guide<\/a> and you\u2019ll be fine.<\/p>\n<p>Second, install K3s in HA mode as we covered, one server with cluster-init and two more joining as servers using the token. Confirm the three nodes are Ready. While you\u2019re here, set up etcd snapshots to a safe location; even a daily copy to object storage replaces fear with confidence.<\/p>\n<p>Third, install Traefik if you\u2019re not using the bundled one, and make sure your DNS or floating IP points at your active node. Test a simple Ingress for a hello-world service over HTTP first. Then bring in cert\u2011manager, create your ClusterIssuer for DNS\u201101, and flip the Ingress to TLS. Watch cert\u2011manager work; that first automatic certificate is a small victory every time.<\/p>\n<p>Fourth, install Longhorn and its prerequisites, set the StorageClass default, and deploy a small StatefulSet that writes a bit of data. Move pods around by draining a node and verify Longhorn reattaches volumes where you expect. That hands-on test removes a lot of stress when you do it later under pressure.<\/p>\n<p>If you like to keep the official references close, bookmark the <a href=\"https:\/\/docs.k3s.io\/installation\/ha-embedded-etcd\" rel=\"nofollow noopener\" target=\"_blank\">K3s HA install guide<\/a>, the <a href=\"https:\/\/cert-manager.io\/docs\/\" rel=\"nofollow noopener\" target=\"_blank\">cert\u2011manager docs<\/a>, and the <a href=\"https:\/\/longhorn.io\/docs\/\" rel=\"nofollow noopener\" target=\"_blank\">Longhorn docs<\/a>. They\u2019re short, friendly, and honest about edge cases.<\/p>\n<h2 id=\"section-13\"><span id=\"Tuning_the_last_mile_readiness_topology_and_small_luxuries\">Tuning the last mile: readiness, topology, and small luxuries<\/span><\/h2>\n<p>There are a few finishing touches that make your cluster feel smooth. Use readiness probes that reflect actual readiness\u2014if an app needs a warm cache, test the endpoint that proves it. Add topology spread constraints so replicas don\u2019t pile onto a single node after a drain. And give yourself a couple of luxuries: a small maintenance page in Traefik you can toggle during major changes, and a canary flavor of your app for safe testing behind an alternate host.<\/p>\n<p>On small clusters, it\u2019s also worth setting gentle resource requests for system components so application pods don\u2019t crowd them out under load. K3s is light, but kube-proxy, CoreDNS, Traefik, cert\u2011manager, and Longhorn all deserve predictable CPU and RAM. That predictability shows up as stability when Friday traffic rolls in.<\/p>\n<h2 id=\"section-14\"><span id=\"What_about_growth_When_three_nodes_arent_enough_anymore\">What about growth? When three nodes aren\u2019t enough anymore<\/span><\/h2>\n<p>Here\u2019s the funny part: a well-tuned three-node K3s cluster can carry more than you\u2019d expect. When it\u2019s time to grow, you\u2019ve got choices. You can add worker nodes to offload heavy workloads while keeping your three-node control plane steady. You can scale storage capacity by adding nodes with bigger disks and letting Longhorn rebalance. Or you can split concerns\u2014run databases on managed platforms and use the cluster for stateless services. The point is, this foundation doesn\u2019t paint you into a corner. It gives you optionality without forcing a migration on a deadline.<\/p>\n<p>If you adopt more hostnames and customer-facing domains, the same ACME + DNS\u201101 ideas scale cleanly. That playbook is the core of what I shared in <a href=\"https:\/\/www.dchost.com\/blog\/en\/saaste-ozel-alan-adlari-ve-otomatik-ssl-dns%e2%80%9101-ile-cok-kiracili-mimarini-nasil-tatli-tatli-olceklersin\/\">Bring Your Own Domain, Get Auto\u2011SSL: DNS\u201101 ACME<\/a>, and it\u2019s exactly how I keep certificate automation boring even as domains multiply.<\/p>\n<h2 id=\"section-15\"><span id=\"A_few_words_on_confidence_and_calm_operations\">A few words on confidence and calm operations<\/span><\/h2>\n<p>One of my clients once told me, \u201cI don\u2019t want the fanciest cluster. I want the cluster I forget about.\u201d That stuck with me. The stack we\u2019ve walked through\u2014K3s HA, Traefik, cert\u2011manager, Longhorn\u2014aims at that feeling. It\u2019s minimal on moving parts, friendly to debug, and forgiving when a single VPS has a bad day. You don\u2019t get everything you\u2019d get from a hyperscale platform, but you get something arguably more valuable: a setup you can hold in your head and run with a small team, on a sensible budget, without drama.<\/p>\n<p>Over time, you\u2019ll add your own touches. Maybe you\u2019ll toss in a GitOps flow for manifests. Maybe you\u2019ll teach Traefik a few more tricks. Maybe you\u2019ll refine your DNS approach with health checks and, if you\u2019re curious about the deeper DNS rabbit hole, strategies like multi-provider authority and stable migration techniques from <a href=\"https:\/\/www.dchost.com\/blog\/en\/coklu-saglayici-dns-nasil-kurulur-octodns-ile-zero%e2%80%91downtime-gecis-ve-dayaniklilik-rehberi\/\">my octoDNS guide<\/a>. None of that is required on day one. The beauty is you can grow into it, one calm improvement at a time.<\/p>\n<h2 id=\"section-16\"><span id=\"Wrap-up_your_3VPS_HA_K3s_cluster_quietly_dependable\">Wrap-up: your 3\u2011VPS HA K3s cluster, quietly dependable<\/span><\/h2>\n<p>So there you have it: a three-VPS K3s cluster that doesn\u2019t flinch when a box reboots, accepts traffic gracefully through Traefik, learns certificates automatically with cert\u2011manager, and keeps your data safe with Longhorn. The pieces play well together. They don\u2019t ask for heroics. And when something does go wrong, the failure modes are understandable\u2014fixable in minutes, not hours.<\/p>\n<p>If you take one thing with you, let it be this: keep the design simple and consistent. Use DNS and a steady entrypoint instead of wrestling with exotic LBs. Let Kubernetes guard the health of your apps with probes and PDBs. Give Longhorn enough room to breathe and back up what matters. And don\u2019t forget the basics\u2014clean OS, private node-to-node network, a firewall stance you can explain to a friend. If you want a refresher on safe, grounded VPS habits, I keep pointing folks to <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-sunucu-guvenligi-nasil-saglanir-kapiyi-acik-birakmadan-yasamanin-sirri\/\">this calm VPS security guide<\/a> because it removes the anxiety at the edges.<\/p>\n<p>Hope this helped you sketch your own \u201cgrown-up\u201d cluster. If you spin one up and run into a weird edge case, I\u2019d love to hear the story. We all get better by sharing the small wins and the odd surprises. Until next time\u2014ship calmly, sleep better, and let your cluster be the boring, dependable engine under the hood.<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>\u0130&ccedil;indekiler1 The moment I knew it was time to grow up my cluster2 Why three VPS nodes make everything feel calmer3 The plan: clean base, simple network, tight doors4 K3s in HA mode with embedded etcd (the easy-on-the-brain setup)5 Traefik as your front door (without overthinking it)6 Certificates that renew themselves (cert\u2011manager is worth the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1900,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[],"class_list":["post-1899","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-teknoloji"],"_links":{"self":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/1899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/comments?post=1899"}],"version-history":[{"count":0,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/1899\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media\/1900"}],"wp:attachment":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media?parent=1899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/categories?post=1899"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/tags?post=1899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}