Technology

ZFS on Linux for Servers: The Calm, No‑Drama Guide to ARC, ZIL/SLOG, Snapshots, and send/receive

So I was sipping an unreasonably strong coffee the other morning, staring at a dashboard that looked like a heart monitor. One of those moments where latency spikes and your pulse kind of sync up. You’ve probably had it too: a storage stack behaving one way in staging, then doing something totally surprising in production. That’s when I found myself quietly grateful for ZFS on Linux. It’s not magic, and it’s not perfect, but when I tune ARC, get ZIL/SLOG right, and keep snapshot/replication habits tight, the graph calms down. More importantly, I do too.

If you’ve ever wondered how to dial in ARC so your apps aren’t starved, or whether you need a SLOG device at all, or how to make snapshots and send/receive feel like a safety net instead of homework, pull up a chair. In this guide, I’ll walk you through how I approach ZFS on Linux for real servers, with the little stories and gotchas that only show up after you’ve fixed a few late‑night incidents. We’ll talk ARC tuning (with guardrails), ZIL/SLOG choices that actually matter, snapshot strategies that don’t rot, and send/receive backups that survive bad networks and human mistakes.

Why ZFS On Linux Keeps Earning Its Spot

Let me start with a quick story. A few years back, a client’s application began to stall under a strange pattern of synchronous writes. The app team swore nothing had changed; the graph said otherwise. We were on ZFS. What saved the day wasn’t a shiny new array or a heroic rewrite. It was the ability to calmly analyze pool health, peek into ARC behavior, flip a dataset property or two, and add a proper SLOG device that actually fit the workload. Five minutes after switching traffic back, the app breathed again. That’s what I love about ZFS: consistent knobs that map to real‑world outcomes.

Here’s the thing: ZFS isn’t just a filesystem; it’s a storage platform. On Linux, that means you get powerful primitives—checksums everywhere, snapshots, clones, copy‑on‑write, ARC, L2ARC, ZIL, and the send/receive pipeline—that add up to a toolkit. You don’t have to use every feature, just the ones that improve your story. And the best part? You can iterate: start safe, observe, then nudge. Small nudges are usually all you need.

Before we dive deep, let me offer two rails to keep you on track. First, compression is your friend. I stick with lz4 by default on Linux; it’s fast and saves more space than you’d expect. Second, resist the urge to enable deduplication unless you absolutely know why you need it and have the RAM to back it up. It can be brilliant in the right corner, but it’s not a general‑purpose speed button.

ARC Tuning Without Drama: Give Your Apps Room to Breathe

The simple mental model

Think of ARC as ZFS’s giant brain. It lives in RAM and caches hot data and metadata. The larger and smarter your ARC, the fewer trips you take to disk. But that same RAM is where your applications want to live too. So the game is balance. In dedicated storage nodes, I let ARC be roomy. On app servers that share storage and compute, I put ARC on a shorter leash so the app can stretch out.

Where I start (and why)

On Linux, ARC sizing is controlled via kernel module parameters. You can set them at runtime or persistently. Runtime is great for experimentation, but persistence wins long‑term.

# Runtime experimentation (bytes)
echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max    # 8G
echo 2147483648 > /sys/module/zfs/parameters/zfs_arc_min    # 2G

# Persistent: /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=8589934592
options zfs zfs_arc_min=2147483648

I like to start with a modest max, observe for a day or two, and nudge upward. Watch the kernel’s memory pressure, swap activity, and app latency. If I see the app gasping, I rein ARC in a bit. If the disks look too busy and the app is waiting on reads, I let ARC stretch. The sweet spot is where the app hums and disks don’t scream during peaks.

Per‑dataset choices that matter

Global ARC sizing is half the story. Per‑dataset properties help you steer the cache into the right lanes. If I’ve got a database dataset that already does its own caching (hello, PostgreSQL, MySQL), I often set:

zfs set primarycache=metadata pool/db

This keeps ARC focused on metadata, not bulk table pages the database will likely cache itself. For write‑heavy logs, I might keep default caching but adjust recordsize to match the workload. Databases love smaller records (like 16K), while media archives prefer larger records (like 1M). For generic filesystems, I usually leave defaults and let ZFS adapt.

# Example: database dataset tuned for smaller blocks
zfs set recordsize=16K pool/db
zfs set atime=off pool/db
zfs set xattr=sa pool/db
zfs set compression=lz4 pool/db

Those couple of properties reduce random write noise, avoid wasting cycles updating access times, and make extended attributes more efficient. Little changes, big difference over months of real traffic.

Observe, don’t guess

When ARC is mis‑sized, you’ll feel it. Look at swap activity, the kernel’s Out‑Of‑Memory killer history, and ARC hit ratios over time. Tools like arcstat and arc_summary are fantastic. If you want to go deeper later, the OpenZFS performance and tuning guide is thorough without being overwhelming.

ZIL and SLOG: When a Tiny SSD Makes a Big Difference

First, what’s the ZIL?

The ZFS Intent Log (ZIL) is the safety diary for synchronous writes. When an application says “write this now and tell me immediately that it’s safe,” ZFS writes that intent to the log so it can survive a power cut. By default, the ZIL lives on your pool. If the workload leans on sync writes—databases, NFS shares for VMs, certain logging patterns—latency can add up.

Enter the SLOG device

A Separate LOG (SLOG) device is an SSD, ideally with power‑loss protection and low latency, that handles those sync log records. If you add a SLOG, you’re not caching everything, you’re just accelerating the small, sync‑critical part of writes. The TL;DR: if your workload does lots of fsync or O_DSYNC writes, a good SLOG can change the mood of the entire system.

# Add a mirrored SLOG to avoid a single point of failure
zpool add pool log mirror nvme0n1 nvme1n1

Mirroring your SLOG is worth the extra SSD. If you lose a non‑mirrored SLOG at the wrong time, your pool will still import cleanly, but you risk losing the most recently acknowledged sync writes. I don’t like rolling those dice in production.

How big should the SLOG be?

This is one of those questions where the answer is “small but sober.” The log only needs to absorb a few seconds of your synchronous write burst, not hours of data. In my experience, 8–32 GB is plenty for many servers. Bigger doesn’t make it faster; faster makes it faster. Choose an SSD with real power‑loss protection and low write latency. Consumer SSDs without PLP can acknowledge data that never actually makes it to non‑volatile storage during a power cut, which defeats the purpose.

A couple of properties that move the needle

ZFS gives you some per‑dataset toggles that complement SLOG decisions:

1) logbias=throughput tells ZFS to prefer raw throughput over low latency for large streaming writes. That keeps big sequential writes from flooding the SLOG unnecessarily.

zfs set logbias=throughput pool/media

2) sync controls how ZFS treats sync operations. The default is standard, which respects the application’s request. always forces everything to be sync (useful for certain NFS or VM guests), and disabled treats sync as async. That last one is tempting for benchmarks, but risky in real life. You might get speed, but after a power loss, you can lose acknowledged writes. I save sync=disabled for lab tests and sleep better in production.

zfs set sync=standard pool/db
# zfs set sync=disabled pool/sandbox   # For test environments only

If you want the deeper under‑the‑hood story, the OpenZFS documentation has excellent sections on how the ZIL behaves under different workloads.

Snapshots: Safe Points That Don’t Get in Your Way

Make snapshots boring and automatic

Snapshots are copy‑on‑write bookmarks of your dataset at a point in time. They’re instantaneous and space‑efficient—until changes accumulate. The trick is routine. I like predictable names and a retention plan that matches the change rate of the data. For fast‑moving app data, I keep more frequent and shorter retention. For archives, fewer snapshots with longer tails.

# A simple naming pattern and schedule
zfs snapshot -r pool/app@daily-$(date +%Y%m%d)

# List snapshots by creation time
zfs list -t snapshot -o name,creation -s creation

Cleanup matters. The easiest way to regret snapshots is to never prune them. Decide how many dailies, weeklies, and monthlies you need. I’ve used cron and systemd timers, shell scripts, and later graduated to tools that manage policies for me. If you prefer friendly automation, take a look at the sanoid/syncoid toolkit—it does snapshotting and replication with sanity baked in.

Clones for safe experiments

One of my favorite ZFS party tricks is cloning a snapshot for testing. When a client is nervous about a migration, I clone the last good snapshot into a scratch dataset, mount it somewhere private, and rehearse. No drama, no risk to the live system. When I’m done, I destroy the clone and the snapshot remains intact.

# Create and mount a clone for testing
zfs snapshot pool/app@pre-migration
zfs clone pool/app@pre-migration pool/app-scratch
# ... test here ...
zfs destroy pool/app-scratch

One more tip: if you’re sending a snapshot off‑box, consider placing a hold on it so automation can’t accidentally delete it before replication completes.

zfs hold backupkeep pool/app@daily-20250101
# Later, when safely replicated
zfs release backupkeep pool/app@daily-20250101

send/receive Backups You Actually Trust

The flow in one breath

Take a snapshot, send it, receive it, keep it, and send incrementals forever after. That’s the rhythm. ZFS makes it fast and safe, and modern OpenZFS lets you resume interrupted sends—huge win for flaky links or long distances.

# Full replication the first time
zfs snapshot -r pool/app@base
zfs send -R pool/app@base | ssh backup.example 
  "zfs receive -uF backup/app"

# Incremental replication later
zfs snapshot -r pool/app@daily-20250115
zfs send -R -I @base pool/app@daily-20250115 | ssh backup.example 
  "zfs receive -uF backup/app"

Two flags that make life better: -R replicates descendant datasets and properties, and -I sends an incremental stream including intermediate snapshots. The -u on receive prevents auto‑mounting the backup datasets, and -F forces a rollback of the target if it diverged.

Smooth the network with a buffer

Replication often stalls not because of disks, but because the network hiccups. I’ve had great results inserting an in‑memory buffer so the sender and receiver can work at their natural pace.

zfs send -R pool/app@daily | mbuffer -s 128k -m 1G | 
  ssh backup.example "mbuffer -s 128k -m 1G | zfs receive -uF backup/app"

If an incremental stream gets interrupted, modern OpenZFS provides a resume token. You can pick up right where you left off instead of re‑sending the world.

# Find a resume token on the receiving side
zfs get receive_resume_token backup/app

# Resume from the sender
zfs send -t <TOKEN> | ssh backup.example "zfs receive -uF backup/app"

Encrypted datasets and raw streams

When using native ZFS encryption, you can send raw encrypted data without decrypting on the sender. That means the receiving side never sees plaintext. It looks like this:

# Create an encrypted dataset (example prompts for a key)
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt pool/secure

# Snap and send raw encrypted blocks
zfs snapshot pool/secure@daily
zfs send -w pool/secure@daily | ssh backup.example "zfs receive -uF backup/secure"

The receiving side can store the encrypted dataset without access to the key. When it’s time to use, load the key on the receiver and mount. It’s a clean model for off‑site backups that must remain dark until disaster day.

Bookmarks, holds, and the paper trail

Bookmarks are like zero‑space, always‑on pointers to snapshots. They’re handy for keeping an incremental base even after you prune old snapshots locally. I make bookmarks before pruning so my replication chain doesn’t break.

# Create a bookmark you can increment from later
zfs bookmark pool/app@daily-20250115 pool/app#base-20250115

Between bookmarks and holds, you get a lifecycle that’s predictable, documented, and resilient to accidents. I’ve had one case where a junior admin nervously deleted what they thought was just a local snapshot; the hold saved us from a broken replication chain.

A Playbook I Keep Coming Back To

The setup

Picture a modest VM host running a few Linux guests and a PostgreSQL database. ZFS sits on a pool of mirrored SSDs, with an optional mirrored SLOG of small, power‑loss‑protected NVMes. Nothing fancy. But everything is thoughtful.

First move after install: set sane defaults.

# Pool‑wide goodness
zfs set compression=lz4 pool
zfs set atime=off pool
zfs set xattr=sa pool

# Dataset structure
zfs create pool/vms
zfs create pool/db
zfs create pool/backups

# Database‑appropriate tweaks
zfs set recordsize=16K pool/db
zfs set primarycache=metadata pool/db
zfs set logbias=latency pool/db

On the VM dataset, I lean on defaults and keep a closer eye on latency. If my VMs are NFS clients, I’ll ensure sync writes are honored end‑to‑end and consider sync=always if the guest behavior demands it. If the hypervisor directly uses ZFS volumes (zvols), I make sure the block size aligns with the guest file system expectations.

ARC sizing strategy

On this kind of host, I start conservative: set ARC max to something that leaves comfortable RAM for the guests and for PostgreSQL. After a few days, I pull thread traces and watch ARC hits, then consider bumping it up. If the database is your star, let it keep its memory crown and keep ZFS focused on metadata and writes. You can go months without touching ARC again if you respect the balance early.

ZIL/SLOG decisions

If the database is issuing lots of fsyncs, a solid SLOG is worth its weight in calm nights. I prefer a mirrored pair of small, low‑latency NVMes designed for sustained write and with power‑loss protection. I set logbias=throughput on datasets that handle large sequential writes (like backup archives) so they don’t crowd the SLOG. Everything else keeps logbias=latency to keep those important sync writes snappy.

Snapshots and replication routine

I keep it boring: hourly snapshots with a 48‑hour tail for the database, daily for VMs with a two‑week tail, monthly for archives. Replicate to a secondary node or an off‑site location nightly. If you’re pairing ZFS with object storage for longer retention, you can complement your setup with something like a production‑ready MinIO setup on a VPS and push application‑level backups there too. ZFS snapshots keep server state tight; object storage keeps app exports durable and cheap.

Recovery rehearsals

Every quarter, I run a restore rehearsal. Clone a snapshot, bring up a VM against the clone, or receive a snapshot into a scratch dataset and run the app. The step you don’t rehearse is the one that surprises you under pressure. On one client, we shaved two hours off our RTO just by scripting the dataset imports and key loading for encrypted backups. The next outage? It was a non‑event.

Observability: The Little Checks That Prevent Big Surprises

Keep an eye on the pool and ARC

I’ve learned to spot trouble early with a few routine commands. None of them are dramatic; all of them are useful.

# Pool health and errors
zpool status -xv

# Realtime I/O insight
zpool iostat -v 1

# Snapshot inventory
zfs list -t snapshot -o name,used,creation -s creation

# ARC health (if arcstat installed)
arcstat 5

When something smells off, I sample these and compare to a known‑good baseline. That’s how I catch a mis‑sized ARC, a VM that went rogue with sync writes, or a slowly failing SSD that started to rack up latency outliers before SMART finally tattled.

And when I want to go deeper or double‑check a tuning hunch, I revisit the thoughtful bits in the OpenZFS performance and tuning guide. It’s like a trusted colleague who doesn’t mind repeating themselves when I need the refresher.

Common Gotchas I Still See (And How I Dodge Them)

Wrong block sizes for the workload

Big blocks feel efficient until a small‑IO workload shows up and grinds. Set recordsize thoughtfully on known workloads like databases, and keep the default elsewhere. Don’t force it if you don’t need to.

Consumer SLOGs with no power‑loss protection

This one bites because it often looks fine—until that one day it’s not. For SLOG, I pay for the boring, enterprise‑minded SSD. It’s not about speed first; it’s about correctness under stress.

Letting snapshots pile up without a plan

Snap judiciously, prune religiously. Late‑night storage runs are no fun, and nobody wants to be the person who deletes half the snapshot tree in a panic. Bookmarks and holds exist to make retention safe. Use them.

Forgetting to test restores

I used to assume sends were fine because the commands ran clean. Then I had a case where the target had diverged subtly, and receive wasn’t actually applying. A quick restore test would have caught it. Lesson learned; quarterly rehearsals ever since.

Wrap‑Up: The Calm Confidence of a Well‑Tuned ZFS

If you’ve read this far, I suspect you’re the kind of person who enjoys a quiet dashboard and a little extra sleep. ZFS on Linux rewards that mindset. You don’t need tricks—just a few steady practices. Size ARC so your apps have room to thrive. Give sync‑heavy workloads a proper SLOG and set logbias where it makes sense. Snapshot on a rhythm that matches the data’s tempo, and prune with care. Replicate with send/receive, use buffers for reliability, and lean on resume tokens when the network throws a tantrum.

I’ve been saved more than once by habits that felt boring at the time. A well‑named snapshot, a small SLOG that quietly did its job, a tuned ARC that didn’t crowd the database—these things add up. You’ll feel it in your latency charts and in those moments when someone asks, “Can we restore last Tuesday’s data for an hour?” and you answer, “Sure, give me ten minutes.”

Hope this was helpful! If you’ve got a weird ZFS story or a tuning question, send it my way. I’ll happily trade notes over coffee. Until then, may your pools stay healthy, your snapshots tidy, and your restores boring—in the best possible way.

Frequently Asked Questions

Great question! If your workload does a lot of synchronous writes (databases, NFS for VMs), a proper, power‑loss‑protected SLOG can reduce latency a lot. If your writes are mostly asynchronous or big streaming jobs, a SLOG may not help much. I mirror SLOGs in production for safety and keep them small but fast.

Here’s the deal: let the app keep the memory crown and size ARC to avoid pressure. Start with a conservative zfs_arc_max, watch swap and app latency for a couple days, then nudge up or down. For databases, set primarycache=metadata so ZFS doesn’t fight the DB’s own cache.

Snap routinely with a clear naming scheme, replicate with zfs send -R and increment with -I, and insert mbuffer on both ends for smoother transfers. Use holds so snapshots don’t vanish mid‑replication, and lean on resume tokens to recover from network hiccups. Test restores quarterly so you know the drill before you need it.