Technology

Never Lose Your Cache Again: High‑Availability Redis for WordPress with Sentinel, AOF/RDB, and Real Failover

So, a quick story. A few summers ago, I was babysitting a WooCommerce sale that was going way too well. The site was humming, Redis was happy, and the object cache was doing its thing—until the primary Redis process quietly vanished in the middle of a promo push. Suddenly, what felt instant started dragging. Queries that used to come from memory were heading to the database, and the admin bar felt like it had weights strapped to it. Nothing exploded, but you know that moment when you can feel the crowd gathering and the stack turning red? That. We survived, but that night I promised myself I’d never run a single Redis node for WordPress again. Not without a plan.

In this guide, I want to share that plan. Let’s talk about high‑availability Redis for WordPress object caching—the kind with Sentinel watching your back, AOF/RDB persistence keeping your data safe, and automatic failover that actually works. I’ll walk you through what to expect, the bits that really matter in production, and the tiny choices that save you during weird 2 a.m. incidents. Think of it like a conversation with a friend who’s already fallen into the ditch you’re trying to avoid.

What “High Availability Redis” Really Means for WordPress

When we say high availability, we’re not chasing some academic ideal. We’re solving one practical problem: if the Redis primary dies, your site should keep serving cached content within seconds—without your team sprinting to the terminal. That’s it.

WordPress uses Redis as an object cache: query results, transients, and bits of computed state. It’s like a super‑fast notebook your site keeps nearby so it doesn’t need to ask the database every little thing. On a quiet day, you might barely notice it. On a busy day, it’s the difference between a checkout that feels instant and one that feels like a spinning wheel.

But here’s the thing—Redis is typically just one process. If that one process gets lonely and leaves, your cache disappears. Your site falls back to the database. That’s not the end of the world if traffic is light. During a sale, though? That’s the sound of your database sweating.

High availability for Redis boils down to three ideas: replication (so there’s always another node ready), Sentinel (to watch, decide, and promote a new primary), and persistence (AOF/RDB) so that restarts don’t wipe your in‑memory world. Wrapped together, this is how you get automatic failover that feels almost boring—exactly the kind of boring you want.

Redis Sentinel: The Quiet Hall Monitor That Saves Your Evening

I like to think of Sentinel as the hall monitor who never sleeps. It watches your Redis primaries and replicas, coordinates who’s alive, and, when the time comes, it promotes a replica to primary. The magic is that Sentinel runs as a separate process and forms a small fleet—usually three nodes is a sweet spot—so decisions aren’t left to one opinionated machine.

In practice, here’s what you do: you set up one Redis primary and at least one replica. Then you run Sentinel on a few nodes (they can sit alongside Redis or on separate tiny instances). Sentinel tracks the primary using a name like mymaster. When the primary goes offline for long enough, Sentinel holds a quick election and tells a replica to take over. It also tells your clients—like WordPress—where the new primary lives. That last part is critical: WordPress shouldn’t be hard‑coded to a single host; it should ask Sentinel for the current primary.

Sentinel isn’t fancy to configure. Here’s a tiny taste of what a sentinel.conf might look like for the basics:

port 26379
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1

That quorum value at the end of the monitor line (here it’s 2) is the minimum number of Sentinels that must agree the primary is down. If you run three Sentinels, a quorum of two is typical. If there’s a flaky network, that number protects you from random split-brain problems. And no, you don’t need to overcomplicate this—just run three Sentinels, keep their clocks sane, and don’t run them all on the same host.

If you want to go deeper into the official fine print, the Redis team has a helpful breakdown of how Sentinel sees the world at the Sentinel documentation. But you don’t need a PhD to get value out of it. In my experience, a basic setup with good timeouts and a clean network gets you 95% of the way there.

Automatic Failover for WordPress: The Plugin Needs to Know Where the Primary Is

Redis will do the failover dance, but WordPress needs to follow its lead. If your object cache plugin connects to a single host and never asks questions, it will keep trying the dead primary. That’s why Sentinel support in your WordPress Redis plugin is a must.

Most folks I work with use the excellent Redis Object Cache plugin. It supports Sentinel, and once you set it, it just feels… calm. The idea is simple: you tell the plugin the Sentinel service name (that mymaster string) and give it the addresses of your Sentinels. From then on, the plugin asks Sentinel for the current primary and connects accordingly. If failover happens mid-traffic, it reconnects to the new primary and keeps going.

If you’re curious about the plugin details or want to peek at the code, the GitHub repo is here: Redis Object Cache plugin on GitHub. I like seeing how tools behave under the hood during failover, and this plugin makes sensible choices.

For a typical wp-config.php, your Sentinel setup might look like this:

// Use the phpredis extension if available
define('WP_REDIS_CLIENT', 'phpredis');

// Sentinel master name
define('WP_REDIS_SENTINEL', 'mymaster');

// List of Sentinel endpoints
define('WP_REDIS_SERVERS', [
    'tcp://10.0.0.21:26379',
    'tcp://10.0.0.22:26379',
    'tcp://10.0.0.23:26379',
]);

// Optional niceties
define('WP_REDIS_PASSWORD', 'a-strong-password');
define('WP_REDIS_PREFIX', 'wp:prod:');
define('WP_REDIS_DATABASE', 0);

That’s it. Under the hood, the plugin queries Sentinel for the current primary of mymaster, connects there, and reconnects if the primary changes. That single detail—connecting through Sentinel—turns chaos into a shrug during a primary outage.

If you’re running WordPress in containers and want to see how Redis fits in alongside Nginx, MariaDB, and Let’s Encrypt, I’ve documented an end‑to‑end workflow in my no‑drama playbook for hosting WordPress on a VPS with Docker, Nginx, MariaDB, Redis, and Let’s Encrypt. The Sentinel bit slots right in.

AOF and RDB: Persistence That Makes Restarts Boring

Redis is an in‑memory store, but you don’t want your cache to evaporate on every restart. That’s where persistence comes in. Two mechanisms live here: RDB snapshots (periodic point‑in‑time dumps) and the AOF append‑only log (a write log that replays changes). You can run either or both. For WordPress object caching, my go‑to is AOF enabled with everysec fsync, plus occasional RDB snapshots for fast restarts and safety.

Think of RDB as a photo you take every once in a while. If Redis restarts, it loads that photo. AOF is like a journal entry for every change. Combining the two means Redis can load a recent photo and then quickly replay the journal to catch up. You get shorter recovery times and fewer cold-cache minutes after a reboot.

Here’s a starter redis.conf slice that’s treated me well:

maxmemory 2gb
maxmemory-policy allkeys-lru

appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite yes

save 900 1
save 300 10
save 60 10000
aof-use-rdb-preamble yes

# Replica settings (on the replica nodes)
# replicaof 10.0.0.10 6379

# Security basics
protected-mode yes
bind 0.0.0.0 ::
requirepass a-strong-password
masterauth a-strong-password

A couple of notes from the field. That everysec setting is the sweet spot for most sites—durability with reasonable latency. If you set fsync to always, your disk becomes part of every write path, and you’ll feel it. no-appendfsync-on-rewrite yes helps you avoid IO storms during AOF rewrites. And if you’re in containers, make sure your AOF/RDB directory is on a persistent volume. I’ve seen folks lose their cache after a restart because their Redis wrote to an ephemeral layer. Not fun.

One more detail: pick an appropriate maxmemory and a policy that won’t surprise you. For WordPress, allkeys-lru tends to be a sensible default because almost everything in the object cache is fair game to evict. Just make sure you provide TTLs through your code and caching stack where it makes sense, so the cache naturally churns.

If you want to read the vendor perspective on durability choices, the official docs are tidy: Redis persistence overview.

Replication and Topology: Keep It Simple (and Quorum-Friendly)

Here’s the map I usually start with: one primary Redis node, one or two replicas, and three Sentinels. The Sentinels can live beside the Redis nodes, but don’t stack everything on one host. I’ve had the best luck when the Sentinels run on different machines (tiny ones are fine), so a single box reboot doesn’t cause panic. The replicas should be close to the primary—latency matters for replication.

You’ll see people talk about Redis Cluster, and it’s fantastic for key‑space sharding and scaling throughput across many nodes. But WordPress object caching isn’t usually a Cluster problem; it’s a stand‑alone Redis with fast failover problem. Sentinel with replication is the right tool for that. It’s simpler, friendlier to single‑key operations, and your object cache plugin expects exactly this style of topology.

That said, replicas aren’t purely decorative. They give you a ready‑to‑promote node, they absorb load during a partial incident, and they’re a clean place to back up persistence files without touching the primary. Just remember that for object caching, writes matter more than fancy read distribution. We’re not building a read‑heavy analytics cluster here—we’re building a resilience cushion.

The WordPress Layer: Behavior, Timeouts, and Edge Cases

Once Redis is replicated and watched by Sentinel, WordPress itself becomes the last mile. There are a few small knobs that make big differences. First, timeouts. Don’t let your PHP processes hang forever waiting for a dead socket. Short, sensible timeouts mean the plugin gives up quickly and asks Sentinel for the new primary. If you have the option to set connect and read timeouts in your Redis client, nudge them to practical values (think seconds, not minutes).

Next, prefixes and databases. It’s neat to isolate caches for different environments with a prefix like wp:prod: and a consistent database index. This avoids awkward collisions if you have staging and production sharing a Redis fleet—don’t laugh, I’ve seen it.

Also, consider the content of your cache. Are you caching huge transient blobs that never expire? I worked with one team who had a long‑lived session object parked in Redis without a TTL. It quietly ballooned their memory usage, and eviction got weird under pressure. Add TTLs where it makes sense and let Redis do its LRU magic without surprises.

Curious about frontend caching too? Object caching is only one piece. When I paired a healthy Redis object cache with careful browser and CDN policy, everything felt snappy even on slow networks. If you want a friendly spin on those headers, I wrote about it in Stop Fighting Your Cache: the guide to Cache-Control immutable, ETag vs Last‑Modified, and asset fingerprinting. It pairs nicely with Redis underneath.

Testing Failover Without Breaking a Sweat

Okay, this is my favorite part, because nothing builds confidence like a controlled failover. After you’ve got your Replication + Sentinel setup running, run a test during a calm hour. Use redis-cli to ask Sentinel to failover the primary, then watch WordPress keep its cool.

From a machine that can talk to your Sentinels:

# See what Sentinel thinks
redis-cli -p 26379 SENTINEL masters
redis-cli -p 26379 SENTINEL slaves mymaster

# Trigger a controlled failover
redis-cli -p 26379 SENTINEL failover mymaster

Now watch the logs. Your Redis nodes will show the election, the replica being promoted, and the old primary becoming a replica. On the PHP side, you might see a short hiccup if a request lands during the switchover, but it should recover. I typically run a tiny watch script that hits the site every second during the failover and prints status and timing. If you’re wired into logging properly, you’ll catch the whole dance.

And about logs—this is one of those moments where centralized logging pays off. If you haven’t yet built a simple ops trail, my practical write‑up on centralized logging with Grafana Loki and Promtail shows how to keep an eye on Redis, PHP‑FPM, and Nginx without drowning in noise. It’s the easiest way to know if failovers are clean or chatty.

Operational Habits: The Stuff That Prevents 2 a.m. Surprises

Let’s talk maintenance. Redis loves RAM. Give it room to breathe, and it will love you back. Keep your maxmemory honest—don’t pretend a 2 GB cache fits into 1 GB. Monitor memory fragmentation over time; it’s normal to fluctuate, but if it climbs endlessly, check your key sizes and patterns. Avoid big keys—that one giant serialized array someone tucked away a year ago can cause real latency spikes when saved or evicted.

Disk matters too, especially with AOF. Fast disks reduce the latency of those every‑second fsyncs. If you’re on cloud block storage, pick a tier that doesn’t stall under bursty writes. When an AOF rewrite runs, you’ll thank yourself. And if you’re in Docker or Kubernetes, put the Redis data directory on a persistent, fast volume—no exceptions.

Security is another quiet hero. Don’t leave Redis wide open. Bind it to private interfaces, use requirepass/masterauth or ACLs, and if your threat model requires it, enable TLS. Redis ships with sensible guardrails—protected-mode is there for a reason. Don’t flip it off without understanding the blast radius. The official docs have a straight‑forward section on security, but the short version is simple: only the things that must talk to Redis should talk to Redis.

Finally, write down your failover plan. I know, it sounds quaint. But I’ve watched experienced teams get flustered at 2 a.m. because no one remembered the exact Sentinel command. A one‑page runbook with “how to trigger failover,” “how to check state,” and “what healthy looks like” is the difference between a calm five minutes and an anxious hour.

Docker Compose Example: A Minimal HA Playground

If you want something you can poke at on a lab server, here’s a simplified example. It’s not meant for production copy‑paste, but it shows the moving parts: a primary, a replica, and three Sentinels. Wire in volumes, credentials, and networking to your taste.

version: '3.9'
services:
  redis-primary:
    image: redis:7
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./primary/redis.conf:/usr/local/etc/redis/redis.conf:ro
      - primary-data:/data
    networks: [net]

  redis-replica:
    image: redis:7
    command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
    volumes:
      - ./replica/redis.conf:/usr/local/etc/redis/redis.conf:ro
      - replica-data:/data
    networks: [net]

  sentinel-1:
    image: redis:7
    command: ["redis-sentinel", "/usr/local/etc/redis/sentinel.conf"]
    volumes:
      - ./sentinel/sentinel.conf:/usr/local/etc/redis/sentinel.conf:ro
    networks: [net]

  sentinel-2:
    image: redis:7
    command: ["redis-sentinel", "/usr/local/etc/redis/sentinel.conf"]
    volumes:
      - ./sentinel/sentinel.conf:/usr/local/etc/redis/sentinel.conf:ro
    networks: [net]

  sentinel-3:
    image: redis:7
    command: ["redis-sentinel", "/usr/local/etc/redis/sentinel.conf"]
    volumes:
      - ./sentinel/sentinel.conf:/usr/local/etc/redis/sentinel.conf:ro
    networks: [net]

volumes:
  primary-data:
  replica-data:

networks:
  net:

In the primary’s redis.conf, you’d set your appendonly yes and maxmemory bits. In the replica’s, add a replicaof pointing to the primary’s service name. And in sentinel.conf you’d reference the primary host and port via Docker DNS (or static IPs in your environment). Then run a controlled failover and watch the logs. It’s oddly satisfying.

Application Behavior During Failover: What Users Feel

Let’s talk about the human experience. During a failover, a handful of requests might miss the cache and head to the database. On a healthy stack, that’s barely a blip. Modern PHP stacks handle reconnect logic quickly, and the cache warms itself almost immediately for hot keys. On heavier sites, I’ll sometimes stage a warmup script after a restart to pre‑pull costly keys, but most WordPress sites don’t need that level of ceremony.

What’s more important is that your database isn’t a single point of failure while Redis is rebalancing. If you’re running a store or membership site, make sure your database has its own high‑availability story. I wrote down my practical notes from the trenches in the MariaDB high availability guide for WooCommerce. Redis smooths performance, but the database is the spine—give it the same love.

Common Gotchas I Keep Seeing (and How to Dodge Them)

First, the missing persistence directory. People deploy Redis in a container, forget to mount a persistent volume for /data, and then wonder why the cache is gone after every restart. The cache should survive a normal reboot with AOF and RDB; if it doesn’t, double‑check where Redis is actually writing its files.

Second, slow disks during AOF rewrite. You don’t notice it for days, and then a rewrite collides with a busy hour and latency creeps in. Keep an eye on INFO persistence metrics and test your disk behavior. If you’re on a tiny VPS with networked storage, be kind to it with no-appendfsync-on-rewrite yes.

Third, Sentinels all on one box. It works… until that box reboots. Spread them out. If you have only two small nodes, consider running two Sentinels on the worker machines and one on a tiny management instance. The goal is quorum even during a small storm.

Fourth, overly sticky objects. If your cache holds onto a few massive keys with no TTL, eviction gets weird under pressure. Get in the habit of storing smaller pieces or attaching practical TTLs. If you’re serializing everything and stuffing it in one key, you’re carrying a couch through a narrow hallway every time.

Security and Access: Keep Redis Behind a Friendly Fence

I’ll keep this short and kind. Don’t put Redis directly on the public internet. Bind it to your private network interfaces, keep a firewall wrapped around the fleet, and use authentication. If your stack supports TLS and you’re crossing untrusted networks, enable it. Also, use separate credentials for applications and internals—your replicas and Sentinels don’t need the same scope as your WordPress app. Security fatigue is real, but a tight perimeter here pays off.

If you’re curious about designing the whole environment with some discipline—deploy flow, zero‑downtime updates, and a less stressful life—I’ve laid out a battle‑tested, scriptable approach in my zero‑downtime CI/CD playbook for a VPS. Redis fits neatly into that rhythm.

Reality Check: When to Scale, When to Rethink

There’s a moment in every project where Redis starts to feel full. You’ll notice eviction counter tick up and hit rates jitter. Before you assume you need a monster node, check what you’re storing, because object caches have a way of collecting things they never needed. Remove overly large or low‑value keys, add TTLs, and watch the hit rate again. Nine times out of ten, you’ll regain headroom without more hardware.

If load really keeps climbing, vertical scaling is usually the first step: more RAM, faster disk, better CPU. Horizontal scaling with Redis Cluster is a bigger architectural decision that WordPress object caching rarely needs. When you get there, it’s often not about the cache anymore—it’s about how your app creates and uses data. At that point, the conversation changes.

A Step‑by‑Step “First Deployment” You Can Try This Week

Here’s a simple path that’s worked for teams I’ve helped:

First, set up a single Redis with AOF everysec and a sensible maxmemory policy. Hook it to WordPress and confirm you’re getting good hit rates. Then add one replica and confirm replication is healthy. Next, deploy three Sentinels, point WordPress at them, and run a controlled failover during a quiet window. Watch your app logs and timings while you do it and take notes. Finally, add basic alerts: disk IO spikes, memory high‑water marks, and Sentinel failover notifications.

By the end of this week‑long cycle, your object cache will be more survivable than most. And you’ll have muscle memory you can rely on when the odd gremlin appears.

Useful Reading If You Want to Go Deeper

If you’re the curious type, two documents are worth skimming:

One is the official Sentinel overview—helpful for seeing how quorum and failover timing interact.

The other is the persistence guide—the small flags around AOF rewrite and RDB scheduling make a big difference.
And if you prefer a more narrative, hands‑on tour of a full WordPress stack with Redis in the middle of it, I’ve mapped out a no‑drama path here: hosting WordPress with Docker, Nginx, MariaDB, Redis, and Let’s Encrypt. Pair that with a smarter browser cache strategy and clean, centralized logs, and you’ll sleep better.

Wrap‑Up: Make Redis Failover the Most Boring Thing You Do

If I had to compress everything into one sentence, it would be this: use Sentinel with a primary‑replica pair, enable AOF everysec (with RDB snapshots), and point WordPress at Sentinel—not a single host. That combination turns a crash into a shrug.

Build the habit of small tests. Trigger a failover when traffic is calm. Verify your plugin reconnects fast. Check your persistence files are landing on a persistent disk. Watch for large keys and memory creep. And give your database the same high‑availability care so a momentary cache miss doesn’t snowball into a bigger issue. If you want a friendly chat about database HA trade‑offs, I wrote about them in my MariaDB HA story for WooCommerce—it pairs with this setup like coffee and a good cookie.

That’s it. Nothing exotic. Just a few proven pieces, wired together carefully, that keep your WordPress site smooth when life gets noisy. Hope this was helpful! If you try the failover test this week and want to share how it went, I’d love to hear about it. See you in the next post.

Frequently Asked Questions

Great question! Most WordPress sites don’t. Object caching usually fits perfectly with a single Redis primary plus replicas, watched by Sentinel for automatic failover. Cluster is awesome for sharding huge keyspaces, but for WordPress it often adds complexity you don’t need.

I like AOF enabled with everysec fsync, plus periodic RDB snapshots. AOF gives you durability across restarts, and the RDB preamble speeds recovery. Keep both on persistent, fast storage, and you’ll avoid cold‑cache pain after a reboot.

Do it during a quiet window. Point WordPress at Sentinel, then run “SENTINEL failover mymaster” from redis-cli. Watch Redis logs, PHP‑FPM timings, and a simple curl loop hitting your homepage. You’ll see a tiny hiccup during promotion, then everything should settle quickly.