For my personal homelab/home server, I prefer to use ZFS Zpools each with a single mirror Vdev. Currently I use 1 zpool with 1 3-way mirror vdev, but when my media (mainly movies and TV shows) grows beyond 500Gb, I will likely make a separate media zpool with a 2-way mirror vdev. This approach probably doesn’t work when a single zpool or vdev exceeds 10-20Tb, but no normal human being really needs that much data – anyone who claims they do is likely a data hoarder and is bad at organizing. It also probably doesn’t work as nicely for multi-tenant storage servers.
My ZFS Configuration
My single Zpool currently looks like the following, with most options omitted for brevity.
sudo zpool create my-zpool mirror \
/dev/disk/by-id/aaa /dev/disk/by-id/bbb /dev/disk/by-id/ccc
sudo zfs create -o quota=25G my-zpool/Pictures
sudo zfs create -o quota=25G my-zpool/Music
sudo zfs create -o quota=10G my-zpool/Documents
sudo zfs create -o quota=10G my-zpool/Books
sudo zfs create -o quota=10G my-zpool/gitrepos
sudo zfs create -o quota=10G my-zpool/artifacts
sudo zfs create -o quota=50G my-zpool/pg-data -o mountpoint=/var/lib/postgresql
sudo zfs create -o quota=50G my-zpool/datalake
sudo zfs create -o quota=200G my-zpool/videos
Most of my ZFS datasets will always remain rather small. The ZFS datasets that
might outgrow the 500Gb size of each drive are datalake
, pg-data
(postgresql), and videos
(movies and TV shows). I have tried to be rather
disciplined with keeping the datasets organized and deliberate, partly because I
want to be prepared to move each of those to a separate zpool (each with 1
mirror vdev). I like the ability to control the redudancy for each of the
datasets; I like the idea of having more redundancy for Documents
, gitrepos
and maybe pgdata
than videos
because those are more important for me and
harder to replace, whereas I can download my movies again from…sources if
needed. I can also accept less redundancy for my datalake
because that is easy
to back up in AWS and just analyze files in an S3-compatible distributed
filesystem. FYI, I am ok with keeping my datalake unencrypted in AWS because I
mainly have public datasets, albiet transformed to suit my analytical needs.
The main drawback of this is “losing” 33%-50% of my raw drive capacity to redundancy, as opposed to maybe 10% of raw capacity “lost” to redundancy in a scenario of RAIDZ-n vdevs. But for normal human beings who do not have crazy capacity requirements, 33%-50% of storage capacity is plenty.
Even if I go a bit crazy with movies, I would rather have a zpool for each of various genres than 1 more complicated zpool because I value the simplicity of “simple mirroring” and ease of management over capacity.
I really want to stick to an all-(SATA-)SSD file server for as long as I can
because I value the minimal power draw, near silence, and high reliability. For
the past year, I haven’t had any integrity issues, atleast after I switched to
using names like /dev/disk/by-id/_____
rather than /dev/sdn
to avoid issues
identifying drives on reboot. So far, my single ZFS mirror has had excellent
scrub time, read and write performance, and reliability. Let’s not ruin it by
getting greedy about capacity. Part of the way I have kept my capacity
requirements low is being disciplined about cleaning out unused data roughly
every quarter.
Conclusion
I had to think quite a lot about my ZFS topology when I was initially setting it up, but this blog post by JR-Systems helped me decide between a single mirror vdev and a single RAIDZ-2 vdev.
Too many words, mister sysadmin. What’s all this boil down to?
- don’t be greedy. 50% storage efficiency is plenty.
- for a given number of disks, a pool of mirrors will significantly outperform a RAIDZ stripe.
- a degraded pool of mirrors will severely outperform a degraded RAIDZ stripe.
- a degraded pool of mirrors will rebuild tremendously faster than a degraded RAIDZ stripe.
- a pool of mirrors is easier to manage, maintain, live with, and upgrade than a RAIDZ stripe.
- BACK. UP. YOUR POOL. REGULARLY. TAKE THIS SERIOUSLY.
TL;DR to the TL;DR – unless you are really freaking sure you know what you’re doing… use mirrors. (And if you are really, really sure what you’re doing, you’ll probably change your mind after a few years and wish you’d done it this way to begin with.)
This is consistent with my limited experience with ZFS.