{"id":1986,"date":"2025-11-17T20:52:15","date_gmt":"2025-11-17T17:52:15","guid":{"rendered":"https:\/\/www.dchost.com\/blog\/canary-deploys-on-a-vps-the-friendly-guide-to-nginx-weighted-routing-health-checks-and-safe-rollbacks\/"},"modified":"2025-11-17T20:52:15","modified_gmt":"2025-11-17T17:52:15","slug":"canary-deploys-on-a-vps-the-friendly-guide-to-nginx-weighted-routing-health-checks-and-safe-rollbacks","status":"publish","type":"post","link":"https:\/\/www.dchost.com\/blog\/en\/canary-deploys-on-a-vps-the-friendly-guide-to-nginx-weighted-routing-health-checks-and-safe-rollbacks\/","title":{"rendered":"Canary Deploys on a VPS: The Friendly Guide to Nginx Weighted Routing, Health Checks, and Safe Rollbacks"},"content":{"rendered":"<div class=\"dchost-blog-content-wrapper\"><div id=\"toc_container\" class=\"toc_transparent no_bullets\"><p class=\"toc_title\">\u0130&ccedil;indekiler<\/p><ul class=\"toc_list\"><li><a href=\"#The_Coffee_Break_That_Turned_Into_a_Safer_Deploy\"><span class=\"toc_number toc_depth_1\">1<\/span> The Coffee Break That Turned Into a Safer Deploy<\/a><\/li><li><a href=\"#What_a_Canary_Deploy_Looks_Like_on_a_Single_VPS\"><span class=\"toc_number toc_depth_1\">2<\/span> What a Canary Deploy Looks Like on a Single VPS<\/a><\/li><li><a href=\"#Your_Building_Blocks_Two_App_Versions_One_Nginx_A_Few_Smart_Files\"><span class=\"toc_number toc_depth_1\">3<\/span> Your Building Blocks: Two App Versions, One Nginx, A Few Smart Files<\/a><ul><li><a href=\"#Two_app_processes_two_ports\"><span class=\"toc_number toc_depth_2\">3.1<\/span> Two app processes, two ports<\/a><\/li><li><a href=\"#Nginx_as_your_traffic_switchboard\"><span class=\"toc_number toc_depth_2\">3.2<\/span> Nginx as your traffic switchboard<\/a><\/li><li><a href=\"#Health_checks\"><span class=\"toc_number toc_depth_2\">3.3<\/span> Health checks<\/a><\/li><li><a href=\"#Instant_rollbacks\"><span class=\"toc_number toc_depth_2\">3.4<\/span> Instant rollbacks<\/a><\/li><\/ul><\/li><li><a href=\"#Nginx_Weighted_Routing_The_Heart_of_the_Canary\"><span class=\"toc_number toc_depth_1\">4<\/span> Nginx Weighted Routing: The Heart of the Canary<\/a><\/li><li><a href=\"#Safer_Logs_Seeing_the_Canary_in_Your_Access_Log\"><span class=\"toc_number toc_depth_1\">5<\/span> Safer Logs: Seeing the Canary in Your Access Log<\/a><\/li><li><a href=\"#Passive_Health_Checks_That_Actually_Help\"><span class=\"toc_number toc_depth_1\">6<\/span> Passive Health Checks That Actually Help<\/a><\/li><li><a href=\"#A_Tiny_Active_Health_Checker_Without_Fancy_Licenses\"><span class=\"toc_number toc_depth_1\">7<\/span> A Tiny Active Health Checker (Without Fancy Licenses)<\/a><\/li><li><a href=\"#The_Rollback_Lever_Editing_One_Line_Reloading_Safely\"><span class=\"toc_number toc_depth_1\">8<\/span> The Rollback Lever: Editing One Line, Reloading Safely<\/a><\/li><li><a href=\"#Observability_Without_Overcomplicating_It\"><span class=\"toc_number toc_depth_1\">9<\/span> Observability Without Overcomplicating It<\/a><\/li><li><a href=\"#Release_Rhythm_A_Calm_Repeatable_Canary_Playbook\"><span class=\"toc_number toc_depth_1\">10<\/span> Release Rhythm: A Calm, Repeatable Canary Playbook<\/a><\/li><li><a href=\"#Avoiding_the_Classics_Sessions_Caches_and_Migrations\"><span class=\"toc_number toc_depth_1\">11<\/span> Avoiding the Classics: Sessions, Caches, and Migrations<\/a><\/li><li><a href=\"#Optional_Percentile-Style_Splits_With_split_clients\"><span class=\"toc_number toc_depth_1\">12<\/span> Optional: Percentile-Style Splits With split_clients<\/a><\/li><li><a href=\"#TLS_Zero-Downtime_Reloads_and_Peace_of_Mind\"><span class=\"toc_number toc_depth_1\">13<\/span> TLS, Zero-Downtime Reloads, and Peace of Mind<\/a><\/li><li><a href=\"#When_You_Need_to_Go_Faster_Retry_and_Fallback_Tweaks\"><span class=\"toc_number toc_depth_1\">14<\/span> When You Need to Go Faster: Retry and Fallback Tweaks<\/a><\/li><li><a href=\"#A_Quick_Word_on_Security\"><span class=\"toc_number toc_depth_1\">15<\/span> A Quick Word on Security<\/a><\/li><li><a href=\"#Troubleshooting_The_Gotchas_I_See_Most\"><span class=\"toc_number toc_depth_1\">16<\/span> Troubleshooting: The Gotchas I See Most<\/a><\/li><li><a href=\"#A_Full_Mini-Playbook_You_Can_Copy\"><span class=\"toc_number toc_depth_1\">17<\/span> A Full Mini-Playbook You Can Copy<\/a><\/li><li><a href=\"#One_More_Thing_Canary_Isnt_Just_for_Code\"><span class=\"toc_number toc_depth_1\">18<\/span> One More Thing: Canary Isn\u2019t Just for Code<\/a><\/li><li><a href=\"#Wrap-Up_A_Calm_Way_to_Ship_Without_Drama\"><span class=\"toc_number toc_depth_1\">19<\/span> Wrap-Up: A Calm Way to Ship Without Drama<\/a><\/li><li><a href=\"#Further_Reading_and_Handy_Docs\"><span class=\"toc_number toc_depth_1\">20<\/span> Further Reading and Handy Docs<\/a><\/li><\/ul><\/div>\n<h2 id=\"section-1\"><span id=\"The_Coffee_Break_That_Turned_Into_a_Safer_Deploy\">The Coffee Break That Turned Into a Safer Deploy<\/span><\/h2>\n<p>So there I was, nursing a lukewarm coffee and staring at a tiny shipping button like it might explode. You know that feeling, right? You\u2019ve finished a feature, tests are green, staging looks fine, and yet production is a different beast. I\u2019ve had those \u201cship it all at once\u201d days, and I\u2019ve also had the \u201coh no, roll it back, roll it all back!\u201d days. The little trick that finally gave me peace? Canary deploys on a single <a href=\"https:\/\/www.dchost.com\/vps\">VPS<\/a> with Nginx doing the gentle traffic juggling. Not a cluster. Not a full-blown service mesh. Just one machine, your app in two flavors, and a reverse proxy doing exactly what you tell it to.<\/p>\n<p>In this guide, I want to show you the practical setup I actually use: two app versions running side-by-side, <strong>Nginx weighted routing<\/strong> to send a small percentage of users to the canary, <strong>health checks<\/strong> that catch trouble before your customers do, and <strong>instant rollbacks<\/strong> when things get weird. We\u2019ll keep it conversational, but we\u2019ll go deep enough that you can copy-paste your way into a safer deploy strategy. If you\u2019ve ever wished for a gentler way to release without the drama, this is for you.<\/p>\n<h2 id=\"section-2\"><span id=\"What_a_Canary_Deploy_Looks_Like_on_a_Single_VPS\">What a Canary Deploy Looks Like on a Single VPS<\/span><\/h2>\n<p>Picture your VPS like a little two-lane road. On one lane, you\u2019ve got your stable app version (let\u2019s call it v1). On the other lane, the shiny new version (v2) is waiting for its first drivers. Instead of opening all lanes to v2 and hoping for the best, you let a few cars in. If those cars arrive safely and no tires fall off, you open the lane a bit more. That\u2019s canary in a nutshell: <strong>gradually ramp up<\/strong>, watch carefully, and have a big red button to put everything back to v1 if needed.<\/p>\n<p>On a single VPS, this is surprisingly doable. You run v1 and v2 on different ports or sockets. Nginx sits in front, listening on your public port 80\/443. It forwards most traffic to v1 and a small amount to v2. If v2 stumbles, Nginx falls back to v1. If v2 is happy, you turn up the dial. It\u2019s low-ceremony and it works.<\/p>\n<p>I remember a client who swore canary was only for big teams with Kubernetes. We tried this on their single VPS for a Friday release (risky, I know). They pushed 5% to v2, saw a small spike in 5xx errors, discovered a subtle cache key issue, fixed it in an hour, and then continued the rollout. No late-night war room. No heartburn. Just a calm canary.<\/p>\n<h2 id=\"section-3\"><span id=\"Your_Building_Blocks_Two_App_Versions_One_Nginx_A_Few_Smart_Files\">Your Building Blocks: Two App Versions, One Nginx, A Few Smart Files<\/span><\/h2>\n<h3><span id=\"Two_app_processes_two_ports\">Two app processes, two ports<\/span><\/h3>\n<p>The simplest pattern: run v1 on 127.0.0.1:5001 and v2 on 127.0.0.1:5002. Whether you use systemd services, Docker containers, or bare binaries doesn\u2019t matter as long as each version exposes an HTTP endpoint and a health check (like <code>\/healthz<\/code>) that returns a clear 200 OK.<\/p>\n<h3><span id=\"Nginx_as_your_traffic_switchboard\">Nginx as your traffic switchboard<\/span><\/h3>\n<p>Nginx will sit in front and route traffic with weighted round-robin. You can fine-tune weights to approximate percentages. It\u2019s not a perfect statistical distribution\u2014keepalives and request patterns can skew it\u2014but for small, controlled rollouts, it\u2019s more than enough.<\/p>\n<h3><span id=\"Health_checks\">Health checks<\/span><\/h3>\n<p>Open-source Nginx gives you <strong>passive<\/strong> health checks out of the box: when an upstream fails, Nginx marks it as bad for a bit. We\u2019ll pair that with a tiny active checker script (curl + systemd timer) to yank a sick canary out of the pool faster than a startled cat. If you need built-in active checks, they\u2019re part of NGINX Plus, but you can get far without it.<\/p>\n<h3><span id=\"Instant_rollbacks\">Instant rollbacks<\/span><\/h3>\n<p>When you touch production, your rollback needs to be as easy as flipping a switch. We\u2019ll keep our config change minimal (one include file or a tiny templated upstream) and teach a script to change weights and reload Nginx safely. That\u2019s your \u201cwhoops, not today\u201d lever.<\/p>\n<h2 id=\"section-4\"><span id=\"Nginx_Weighted_Routing_The_Heart_of_the_Canary\">Nginx Weighted Routing: The Heart of the Canary<\/span><\/h2>\n<p>Here\u2019s a simple, battle-tested Nginx upstream that sends most requests to v1 and a small slice to v2:<\/p>\n<pre class=\"language-nginx line-numbers\"><code class=\"language-nginx\">http {\n    log_format canary_main '$remote_addr - $remote_user [$time_local] '\n                          '&quot;$request&quot; $status $body_bytes_sent '\n                          '&quot;$http_referer&quot; &quot;$http_user_agent&quot; '\n                          'upstream=$upstream_addr '\n                          'rt=$request_time urt=$upstream_response_time '\n                          'ust=$upstream_status';\n\n    access_log \/var\/log\/nginx\/access.log canary_main;\n\n    upstream app_pool {\n        zone app_pool 64k;\n        keepalive 64;\n\n        # Stable\n        server 127.0.0.1:5001 weight=19 max_fails=3 fail_timeout=10s;\n\n        # Canary\n        server 127.0.0.1:5002 weight=1  max_fails=3 fail_timeout=10s;\n    }\n\n    server {\n        listen 80;\n        server_name example.com;\n\n        location \/ {\n            proxy_set_header Host $host;\n            proxy_set_header X-Real-IP $remote_addr;\n            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n            proxy_set_header X-Forwarded-Proto $scheme;\n\n            proxy_http_version 1.1;\n            proxy_set_header Connection &quot;&quot;;\n\n            proxy_pass http:\/\/app_pool;\n            proxy_connect_timeout 2s;\n            proxy_send_timeout 30s;\n            proxy_read_timeout 30s;\n        }\n\n        location = \/healthz {\n            access_log off;\n            return 200 'ok';\n        }\n    }\n}\n<\/code><\/pre>\n<p>With weights 19:1, you\u2019re roughly sending 5% to the canary. If you want a gentler trickle, bump v1\u2019s weight higher and keep v2 at 1. If you want to ramp to 50\/50, make the weights equal. It\u2019s like a dimmer switch for traffic.<\/p>\n<p>If you prefer a strict percentage split rather than weighted round-robin, Nginx\u2019s <code>split_clients<\/code> can assign requests to a bucket based on a hash of a stable key (say, a cookie or IP). That keeps users sticky to the same bucket during the canary window. The pattern is something like \u201c90% get <code>@v1<\/code>, 10% get <code>@v2<\/code>, then use named locations.\u201d It\u2019s a few more lines, but you\u2019ll get consistent user assignment at the application layer. See the <a href=\"http:\/\/nginx.org\/en\/docs\/http\/ngx_http_split_clients_module.html\" rel=\"nofollow noopener\" target=\"_blank\">split_clients directive<\/a> for the idea.<\/p>\n<p>One note on stickiness: if your app relies on sessions that aren\u2019t shared (like in-memory sessions), you might want <strong>ip_hash<\/strong> to keep a client pinned to one upstream. Open-source Nginx doesn\u2019t have cookie-based stickiness built-in, but ip-based hashing is often enough. Better yet, externalize sessions to Redis or your database so either version can serve a user without surprises.<\/p>\n<h2 id=\"section-5\"><span id=\"Safer_Logs_Seeing_the_Canary_in_Your_Access_Log\">Safer Logs: Seeing the Canary in Your Access Log<\/span><\/h2>\n<p>Your logs tell you what\u2019s really happening. That <code>log_format<\/code> above adds upstream address, response times, and upstream status to every line. This is gold during a canary. You can tail the logs and quickly see if the canary upstream is misbehaving.<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\"># Example log lines (wrapped for clarity)\n203.0.113.10 - - [17\/Nov\/2025:14:02:13 +0000] &quot;GET \/api\/orders HTTP\/1.1&quot; 200 512 \n&quot;-&quot; &quot;Mozilla\/5.0&quot; upstream=127.0.0.1:5002 rt=0.120 urt=0.115 ust=200\n\n203.0.113.11 - - [17\/Nov\/2025:14:02:14 +0000] &quot;GET \/api\/orders HTTP\/1.1&quot; 502 0 \n&quot;-&quot; &quot;curl\/7.68.0&quot; upstream=127.0.0.1:5002 rt=0.050 urt=0.050 ust=502\n<\/code><\/pre>\n<p>If you start seeing 5xx coming from the canary address, that\u2019s your signal to hold or roll back. I usually keep a quick one-liner handy:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">grep 5002 \/var\/log\/nginx\/access.log | awk '{print $9}' | sort | uniq -c<\/code><\/pre>\n<p>That glance tells you whether canary requests are trending toward 200s or something spicier. You don\u2019t need a full observability stack to get useful signals during a canary; your access log is a surprisingly honest friend.<\/p>\n<h2 id=\"section-6\"><span id=\"Passive_Health_Checks_That_Actually_Help\">Passive Health Checks That Actually Help<\/span><\/h2>\n<p>Out of the box, Nginx gives you passive health checks via <code>max_fails<\/code> and <code>fail_timeout<\/code>. If the canary starts throwing errors or not responding, Nginx marks it as failed and stops sending traffic for a little while. Combined with <code>proxy_next_upstream<\/code>, you can make failing requests fallback gracefully to stable without the user noticing:<\/p>\n<pre class=\"language-nginx line-numbers\"><code class=\"language-nginx\">location \/ {\n    proxy_next_upstream error timeout http_502 http_503 http_504;\n    proxy_next_upstream_tries 2;\n    proxy_pass http:\/\/app_pool;\n}\n<\/code><\/pre>\n<p>That tells Nginx, \u201cIf canary is throwing 502\/503\/504, just retry on a different upstream (which will likely be v1).\u201d You\u2019ll still want to keep an eye on the logs, but this transforms a hard error into a soft retry, which is often good enough to buy time.<\/p>\n<p>If you want to dig into the knobs, the upstream module docs explain these directives well: <a href=\"http:\/\/nginx.org\/en\/docs\/http\/ngx_http_upstream_module.html\" rel=\"nofollow noopener\" target=\"_blank\">ngx_http_upstream_module<\/a>. My advice: start simple. Max 2 retries, fail_timeout of 10\u201330 seconds, and a canary weight small enough to minimize blast radius.<\/p>\n<h2 id=\"section-7\"><span id=\"A_Tiny_Active_Health_Checker_Without_Fancy_Licenses\">A Tiny Active Health Checker (Without Fancy Licenses)<\/span><\/h2>\n<p>Passive checks are great, but sometimes you want an aggressive little sentinel that actively probes the canary and yanks it out if it stumbles. A tiny script plus a systemd timer is more than enough.<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\"># \/usr\/local\/bin\/canary-healthcheck.sh\n#!\/usr\/bin\/env bash\nset -euo pipefail\nCANARY_URL=&quot;http:\/\/127.0.0.1:5002\/healthz&quot;\nHOST_HEADER=&quot;example.com&quot;\nFAILS=0\nMAX_FAILS=3\n\nfor i in {1..3}; do\n  if curl -fsS -H &quot;Host: ${HOST_HEADER}&quot; --max-time 2 &quot;$CANARY_URL&quot; &gt;\/dev\/null; then\n    exit 0  # healthy\n  else\n    FAILS=$((FAILS+1))\n  fi\n  sleep 1\ndone\n\nif [[ $FAILS -ge $MAX_FAILS ]]; then\n  \/usr\/local\/bin\/canary-weight.sh disable\nfi\n<\/code><\/pre>\n<p>And the timer\/service:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\"># \/etc\/systemd\/system\/canary-healthcheck.service\n[Unit]\nDescription=Canary health check\n\n[Service]\nType=oneshot\nExecStart=\/usr\/local\/bin\/canary-healthcheck.sh\n\n# \/etc\/systemd\/system\/canary-healthcheck.timer\n[Unit]\nDescription=Run canary health check every 10s\n\n[Timer]\nOnBootSec=30s\nOnUnitActiveSec=10s\nAccuracySec=1s\n\n[Install]\nWantedBy=timers.target\n<\/code><\/pre>\n<p>Enable it with <code>systemctl enable --now canary-healthcheck.timer<\/code> and forget about it. If canary fails three quick checks, the script will flip the switch and reload Nginx. You can get fancier (cooldown windows, chatter reduction), but this little watchdog catches most real-world hiccups.<\/p>\n<h2 id=\"section-8\"><span id=\"The_Rollback_Lever_Editing_One_Line_Reloading_Safely\">The Rollback Lever: Editing One Line, Reloading Safely<\/span><\/h2>\n<p>When things go sideways, you want fewer keystrokes, not more. I like a small include file inside the upstream that I can swap or edit without touching the rest of the config. For example:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\"># \/etc\/nginx\/conf.d\/upstream-app.conf\nupstream app_pool {\n    zone app_pool 64k;\n    keepalive 64;\n\n    server 127.0.0.1:5001 weight=100 max_fails=3 fail_timeout=10s;  # stable\n    include \/etc\/nginx\/conf.d\/canary-server.include;                # canary\n}\n<\/code><\/pre>\n<p>Then the include file is the only thing you change:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\"># \/etc\/nginx\/conf.d\/canary-server.include\nserver 127.0.0.1:5002 weight=1 max_fails=3 fail_timeout=10s;\n<\/code><\/pre>\n<p>Want to disable canary instantly? Swap it to \u201cdown\u201d and reload:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\"># \/etc\/nginx\/conf.d\/canary-server.include\nserver 127.0.0.1:5002 down;\n<\/code><\/pre>\n<p>To avoid typos while your heart rate is up, wrap it in a helper:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\"># \/usr\/local\/bin\/canary-weight.sh\n#!\/usr\/bin\/env bash\nset -euo pipefail\nINC=&quot;\/etc\/nginx\/conf.d\/canary-server.include&quot;\nACTION=&quot;${1:-}&quot;\nVALUE=&quot;${2:-}&quot;\n\ncase &quot;$ACTION&quot; in\n  set)\n    # VALUE is expected to be an integer weight like 1,5,10,50,100\n    echo &quot;server 127.0.0.1:5002 weight=${VALUE} max_fails=3 fail_timeout=10s;&quot; \n      &gt; &quot;$INC&quot; ;;\n\n  disable)\n    echo &quot;server 127.0.0.1:5002 down;&quot; &gt; &quot;$INC&quot; ;;\n\n  enable)\n    echo &quot;server 127.0.0.1:5002 weight=1 max_fails=3 fail_timeout=10s;&quot; \n      &gt; &quot;$INC&quot; ;;\n\n  *)\n    echo &quot;Usage: canary-weight.sh {set &lt;N&gt;|enable|disable}&quot;;\n    exit 1 ;;\n\nesac\n\nnginx -t &amp;&amp; systemctl reload nginx\n<\/code><\/pre>\n<p>With that, your rollout becomes muscle memory: <code>canary-weight.sh enable<\/code> for ~1% (if stable is weight 100), <code>canary-weight.sh set 5<\/code> for a little bolder, and <code>canary-weight.sh disable<\/code> if anything looks off. The important thing is the reload safety check: <code>nginx -t<\/code> before <code>systemctl reload nginx<\/code> so a syntax error never becomes an outage.<\/p>\n<h2 id=\"section-9\"><span id=\"Observability_Without_Overcomplicating_It\">Observability Without Overcomplicating It<\/span><\/h2>\n<p>During a canary, you mostly care about a few signals: error rates, slow responses, and whether users are getting bounced between versions. You can learn a lot from the Nginx access log, which is why we enriched it earlier. A few practical checks I do in the first minutes of a canary:<\/p>\n<p>First, are there 5xx in the canary upstream?<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">awk '$0 ~ \/127.0.0.1:5002\/ {print $9}' \/var\/log\/nginx\/access.log | sort | uniq -c<\/code><\/pre>\n<p>Second, are response times worse on canary?<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">awk '$0 ~ \/127.0.0.1:5002\/ {print $0}' \/var\/log\/nginx\/access.log | \n  awk -F&quot;rt=&quot; '{print $2}' | awk '{print $1}' | sort -n | tail<\/code><\/pre>\n<p>Third, are retries happening?<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">grep -E 'ust=(502|503|504)' \/var\/log\/nginx\/access.log | grep 5002 | wc -l<\/code><\/pre>\n<p>These are blunt instruments, but they\u2019re fast and don\u2019t require spinning up a metric store. If you\u2019re already running something like Prometheus or a log aggregation tool, great\u2014send <code>$upstream_addr<\/code> and <code>$upstream_status<\/code> as labels and you\u2019ll have an even cleaner picture.<\/p>\n<h2 id=\"section-10\"><span id=\"Release_Rhythm_A_Calm_Repeatable_Canary_Playbook\">Release Rhythm: A Calm, Repeatable Canary Playbook<\/span><\/h2>\n<p>This is the cadence that\u2019s served me well. It\u2019s simple enough to remember, even on a hectic day:<\/p>\n<p>Before you start, have v2 deployed but not receiving traffic, health endpoint ready, logs rolling, and your rollback script tested. A quick smoke test with <code>curl<\/code> directly against 127.0.0.1:5002 should pass with expected headers and outputs.<\/p>\n<p>Step 1: enable canary at the lowest weight. Let it sit for a few minutes. Browse around as a real user. Hit the critical flows. Watch the logs. If any errors pop, pause and fix.<\/p>\n<p>Step 2: raise the weight modestly. Maybe from 1 to 5. Give it another few minutes. Check again. Remind yourself to breathe. The whole point is to be boring.<\/p>\n<p>Step 3: nudge to 10\u201320 if everything is still clean. If you run a store, place a test order. If you run a dashboard, check pagination, filters, everything that fans out to the back end. Keep an eye on database connections and queue depths if you have them.<\/p>\n<p>Step 4: go to 50. Leave it a bit longer. Note any subtle differences in latency or CPU. At this point, your users are basically telling you if v2 is good. Listen to them.<\/p>\n<p>Step 5: all-in. Set canary to 100, then either shut down v1 or leave it as a warm standby for a day. If traffic is small, you can skip directly from 20 to 100\u2014but I like the rhythm. It keeps surprises rare.<\/p>\n<h2 id=\"section-11\"><span id=\"Avoiding_the_Classics_Sessions_Caches_and_Migrations\">Avoiding the Classics: Sessions, Caches, and Migrations<\/span><\/h2>\n<p>I\u2019ve seen canaries wobble because of the unglamorous stuff. If your sessions live in process memory on v1, users routed to v2 will feel like they\u2019ve been logged out. That\u2019s not a code bug; it\u2019s a session store mismatch. The fix is to centralize sessions in Redis or your database so either version can handle a user seamlessly.<\/p>\n<p>Caches can bite too. If v2 changes cache keys or the shape of cached content, you might get weirdness where a response from v2 isn\u2019t valid for v1 and vice versa. A safe approach is to make cache keys backward compatible during the canary window, then clean things up once v1 is retired.<\/p>\n<p>Database migrations deserve their own paragraph. When you run a canary, you want <strong>backward-compatible changes<\/strong>. That usually means additive schema updates\u2014adding columns or tables without removing or renaming existing ones\u2014so v1 and v2 can coexist. When it\u2019s time to cut over fully, you remove the old paths. If in doubt, a pre-release snapshot of your data gives you the confidence to hit the brakes. If you haven\u2019t set that up yet, here\u2019s how I take <a href=\"https:\/\/www.dchost.com\/blog\/en\/uygulama%e2%80%91tutarli-yedekler-nasil-alinir-lvm-snapshot-ve-fsfreeze-ile-mysql-postgresqli-usutmeden-dondurmak\/\">application\u2011consistent hot backups with LVM snapshots<\/a> before risky changes.<\/p>\n<h2 id=\"section-12\"><span id=\"Optional_Percentile-Style_Splits_With_split_clients\">Optional: Percentile-Style Splits With split_clients<\/span><\/h2>\n<p>Weighted round-robin is fine to start, but sometimes you want a precise percentage and consistent user assignment. <code>split_clients<\/code> can do that using any stable key (IP, cookie, user ID). For example:<\/p>\n<pre class=\"language-nginx line-numbers\"><code class=\"language-nginx\">split_clients &quot;$remote_addr$http_user_agent&quot; $bucket {\n    5%     &quot;canary&quot;;\n    *      &quot;stable&quot;;\n}\n\nmap $bucket $route_to_canary {\n    default 0;\n    canary  1;\n}\n\nserver {\n    listen 80;\n\n    location \/ {\n        error_page 418 = @canary;\n        if ($route_to_canary) { return 418; }\n        proxy_pass http:\/\/v1;\n    }\n\n    location @canary {\n        proxy_pass http:\/\/v2;\n    }\n}\n\nupstream v1 { server 127.0.0.1:5001; }\nupstream v2 { server 127.0.0.1:5002; }\n<\/code><\/pre>\n<p>That trick uses a harmless internal redirect via a named location to choose v1 or v2. It\u2019s a touch more configuration, but it solves the \u201ckeep this user on the same version\u201d problem without special modules. If you\u2019re curious about all the options in that directive, the <a href=\"http:\/\/nginx.org\/en\/docs\/http\/ngx_http_split_clients_module.html\" rel=\"nofollow noopener\" target=\"_blank\">official docs<\/a> are a quick read.<\/p>\n<h2 id=\"section-13\"><span id=\"TLS_Zero-Downtime_Reloads_and_Peace_of_Mind\">TLS, Zero-Downtime Reloads, and Peace of Mind<\/span><\/h2>\n<p>All of this sits better on HTTPS, of course. ACME automation keeps certificates fresh and reloads inexpensive. Nginx reloads are zero-downtime, which is a gift: you can change weights, swap canary off, and keep the connection pool warm the whole time. The sequence you want burned into your fingertips is: edit include, <code>nginx -t<\/code>, <code>systemctl reload nginx<\/code>.<\/p>\n<p>If you serve apps behind a CDN or a private tunnel, the canary pattern still applies. Just make sure your health checks and active probing go to the origin where v1 and v2 live. Exotic networks are cool, but the canary plan is the same: small, watchful, and reversible.<\/p>\n<h2 id=\"section-14\"><span id=\"When_You_Need_to_Go_Faster_Retry_and_Fallback_Tweaks\">When You Need to Go Faster: Retry and Fallback Tweaks<\/span><\/h2>\n<p>Sometimes a canary fails in strange ways. Maybe a new dependency is flaky, or an external API returns timeouts only at certain hours. A couple of Nginx tweaks can make these bumps survivable without turning your logs into a wall of red:<\/p>\n<p>First, turn on limited upstream retries for transient errors (we used 502\/503\/504 earlier). Second, keep your timeouts tight so failing requests don\u2019t clog the pipe. Third, decide how aggressive your health checker should be\u2014do you want it to disable canary after three misses, or should it wait longer? If your app uses a circuit breaker pattern internally, these layers can complement each other.<\/p>\n<p>You don\u2019t have to overengineer it. Start modest and tighten over time as your traffic and risk change. The <a href=\"http:\/\/nginx.org\/en\/docs\/http\/ngx_http_upstream_module.html\" rel=\"nofollow noopener\" target=\"_blank\">upstream docs<\/a> are a good place to confirm what a directive really does before you ship it.<\/p>\n<h2 id=\"section-15\"><span id=\"A_Quick_Word_on_Security\">A Quick Word on Security<\/span><\/h2>\n<p>Even during a canary, security basics still apply. Keep your admin endpoints locked down, don\u2019t expose internal ports, and ensure your health checks don\u2019t leak sensitive data. If you\u2019re behind a firewall or a zero-trust tunnel, make sure your health checker can still reach the canary locally. And when in doubt, keep the canary\u2019s logs short-lived and sanitized\u2014debug output is helpful, but not at the cost of secrets.<\/p>\n<h2 id=\"section-16\"><span id=\"Troubleshooting_The_Gotchas_I_See_Most\">Troubleshooting: The Gotchas I See Most<\/span><\/h2>\n<p>First, \u201cweight math\u201d that doesn\u2019t do what you think. If you set v1 to weight 1 and v2 to weight 1, expect almost half of traffic on canary. I\u2019ve watched someone accidentally 50\/50 their canary five minutes after midnight and wonder why alerts fired. A simple habit: choose a stable baseline like 100 for v1 and small numbers for v2, then scale from there.<\/p>\n<p>Second, forgetting keepalives. Without <code>proxy_http_version 1.1<\/code> and the <code>Connection<\/code> header cleared, Nginx can close connections more often, and your app might see connection churn that looks like a canary bug. Keepalives are boring, predictable friends.<\/p>\n<p>Third, session stickiness assumptions. If a user logs in on v1 and is sent to v2 for an API call, lack of shared sessions might look like an auth bug. It\u2019s not. It\u2019s a routing artifact. Either make sessions shared or use stickiness tactics during the canary window.<\/p>\n<p>Fourth, database migrations that remove fields too early. If v2 expects a column that v1 doesn\u2019t, and both versions are live, you\u2019ll get errors that look like ghosts. Make changes additive first, then subtract once v1 is gone.<\/p>\n<h2 id=\"section-17\"><span id=\"A_Full_Mini-Playbook_You_Can_Copy\">A Full Mini-Playbook You Can Copy<\/span><\/h2>\n<p>Here\u2019s a tidy checklist I keep around:<\/p>\n<p>Prepare: two app versions on different ports with <code>\/healthz<\/code>. Nginx upstream with canary include. Logging with upstream info. Health checker timer enabled. A tested rollback script.<\/p>\n<p>Ship: enable canary at low weight; test key flows; watch logs; raise weight; repeat. Don\u2019t jump by more than 2\u20133 steps without observing.<\/p>\n<p>Hold: if you see 5xx or rising retries, <code>canary-weight.sh disable<\/code>, reload, and examine canary logs in isolation. Fix, redeploy v2, and resume at a low weight.<\/p>\n<p>Finish: when all-in, leave v1 running but idle for a bit. If no errors after a comfortable window, shut v1 down and archive logs. Clean up the canary include so it\u2019s ready for next time.<\/p>\n<h2 id=\"section-18\"><span id=\"One_More_Thing_Canary_Isnt_Just_for_Code\">One More Thing: Canary Isn\u2019t Just for Code<\/span><\/h2>\n<p>Feature flags pair beautifully with canaries. You can ship the new binary but keep the risky path turned off for most users, then enable it for the canary cohort. Config changes can be canaried too\u2014say, a new cache TTL or API endpoint\u2014by only letting the canary version read the new config until you\u2019re confident.<\/p>\n<p>And yes, even infrastructure can be canaried. Maybe a new TLS setting or a different compression level. Start small, watch your error rates and latencies, then expand. The pattern stays the same: small, visible, reversible.<\/p>\n<h2 id=\"section-19\"><span id=\"Wrap-Up_A_Calm_Way_to_Ship_Without_Drama\">Wrap-Up: A Calm Way to Ship Without Drama<\/span><\/h2>\n<p>If you\u2019ve ever wanted to ship with more confidence but less ceremony, canary deploys on a single VPS are a sweet spot. Two versions, one Nginx, a few lines of config, and a tiny script or two. You guide a small slice of traffic to the canary, watch what happens in your logs, and keep a rollback lever within arm\u2019s reach. It\u2019s the kind of setup you can explain to a teammate in five minutes and rely on for years.<\/p>\n<p>Start with the basics: a health endpoint, weighted upstream, and a reload-safe include file. Add the active checker when you want extra safety. Mix in shared sessions and additive database changes so your versions play nicely together. If you need strict percentage splits or sticky canary cohorts, sprinkle in <code>split_clients<\/code>. It\u2019s not magic; it\u2019s a simple rhythm that helps you ship without the pit in your stomach.<\/p>\n<p>Hope this was helpful! If you\u2019ve got stories from your own canary adventures, I\u2019d love to hear them. Until then, may your deploys be boring, your logs readable, and your rollbacks instant.<\/p>\n<h2 id=\"section-20\"><span id=\"Further_Reading_and_Handy_Docs\">Further Reading and Handy Docs<\/span><\/h2>\n<p>If you want to dive deeper into the Nginx knobs we used, the official docs for <a href=\"http:\/\/nginx.org\/en\/docs\/http\/ngx_http_upstream_module.html\" rel=\"nofollow noopener\" target=\"_blank\">upstream configuration and health-related parameters<\/a> and the <a href=\"http:\/\/nginx.org\/en\/docs\/http\/ngx_http_split_clients_module.html\" rel=\"nofollow noopener\" target=\"_blank\">split_clients directive<\/a> are short and clear. For passive retry behavior, the proxying docs include <code>proxy_next_upstream<\/code> and friends, which are worth a skim before your next rollout.<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>\u0130&ccedil;indekiler1 The Coffee Break That Turned Into a Safer Deploy2 What a Canary Deploy Looks Like on a Single VPS3 Your Building Blocks: Two App Versions, One Nginx, A Few Smart Files3.1 Two app processes, two ports3.2 Nginx as your traffic switchboard3.3 Health checks3.4 Instant rollbacks4 Nginx Weighted Routing: The Heart of the Canary5 Safer [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1987,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[],"class_list":["post-1986","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-teknoloji"],"_links":{"self":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/1986","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/comments?post=1986"}],"version-history":[{"count":0,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/1986\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media\/1987"}],"wp:attachment":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media?parent=1986"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/categories?post=1986"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/tags?post=1986"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}