ZFS
Contents:
Install zfs
Install on OpenSUSE Tumbleweed
OpenZFS is available for install from a separate repository which needs to be added to OpenSUSE.
zypper addrepo https://download.opensuse.org/repositories/filesystems/openSUSE_Tumbleweed/filesystems.repo
zypper refresh
zypper install zfs
For more details see the OpenSUSE ZFS package.
Tumbleweed won’t boot after a kernel upgrade
I have found, on two machines, than after installing zfs and then upgrading the kernel with zypper dup
the system fails to boot. This seems to be caused by the initramfs failing to be built because the kernel modules for zfs could not be included.
I don’t need zfs at boot time on my Tumbleweed systems, so I found the best way to fix the issue is to omit the zfs kernel modules from the initramfs.
Doing this will omit zfs from being build into the initramfs:
echo 'omit_dracutmodules+=" zfsexpandknowledge zfs "' | sudo tee /etc/dracut.conf.d/skip-zfs.conf
And doing this will build the initramfs for the currently running kernel.
sudo /usr/bin/dracut -f /boot/initrd-$(uname -r) $(uname -r)
or for a specific kernel:
sudo /usr/bin/dracut -f /boot/initrd-6.11.7-1-default 6.11.7-1-default
Install on MacOS
Download from here the appropriate OpenZFS for OSX package for your MacOS version, or the installer containing all of the packages, and install the appropriate .pkg
file for your operating system. A reboot is required after the install has completed.
Pools and filesystems
As will be demonstrated in a moment, one or more devices (or files) can be brought together to create a zfs pool, into which one or more datasets can be stored. A dataset can be a POSIX style filesystem, a block device style volume, or a snapshot of a filesystem or volume.
When a zfs pool is created a root filesystem dataset is created. If you create a zfs pool called tank
then a zfs filesystem called tank
will be created in that pool.
Further datasets can be created within a dataset
Within a dataset’s metadata there is information about where the dataset will be mounted onto the filesystem. A special mount point of none
exists to stop a dataset being mounted. In many circumstances the root filesystem dataset is not mounted.
Building pools
Building up to a RAID10 pool
It is possible to start with a single disk pool and later include other disks to make the pool fault tolerant and extend the pool size.
Lets do this for real - only we’ll use files instead of disks.
- make temporary files to be used in the exercise
- make a pool using one file only
- turn the pool into a mirrored pool (RAID1)
- extend the pool size (RAID10)
- reduce the pool size
- destroy the pool
Carefully note the difference between add
and attach
- they are very different actions to the pool.
Using the wrong verb will be disastrous to your data.
1. make the temporary files
Make four empty files which we can use as though they were disks.
$ mkdir ~/tmp_zfs
$ cd ~/tmp_zfs/
$ for i in {0..3} ; do truncate -s 10G $i.raw ; done
$ ls -l
-rw-r--r-- 1 user users 10737418240 Oct 29 08:50 0.raw
-rw-r--r-- 1 user users 10737418240 Oct 29 08:50 1.raw
-rw-r--r-- 1 user users 10737418240 Oct 29 08:50 2.raw
-rw-r--r-- 1 user users 10737418240 Oct 29 08:50 3.raw
2. make a pool
Using one of the files, create a simple pool with one vdev.
$ sudo zpool create -m none -o ashift=12 tmp_zfs "$(pwd)/0.raw"
$ sudo zpool status tmp_zfs
pool: tmp_zfs
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tmp_zfs ONLINE 0 0 0
/home/user/tmp_zfs/0.raw ONLINE 0 0 0
errors: No known data errors
The command line switches explained:
zpool create
create a zfs pool-m none
do not mount the root dataset anywhere on the filesystem-o ashift=12
use 4096 byte sectors (might be detected automatically anyway)tmp_zfs
the name of the pool and the name of the root dataset"$(pwd)/0.raw"
the name of the file (or files, disk, or disks) to create the pool from
3. mirror the pool (RAID1)
Include a second file in the pool to create a mirrored vdev.
$ sudo zpool attach tmp_zfs "$(pwd)/0.raw" "$(pwd)/1.raw"
$ sudo zpool status tmp_zfs
pool: tmp_zfs
state: ONLINE
scan: resilvered 696K in 00:00:00 with 0 errors on Tue Oct 29 08:54:17 2024
config:
NAME STATE READ WRITE CKSUM
tmp_zfs ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/tmp_zfs/0.raw ONLINE 0 0 0
/home/user/tmp_zfs/1.raw ONLINE 0 0 0
errors: No known data errors
Note how the vdev mirror-0
has appeared.
4. extend the pool (RAID10)
Two options: and one drive at a time, or add two drives together.
4.1.1 add single drive
$ sudo zpool add -f tmp_zfs "$(pwd)/2.raw"
mdsh@mdsh01:~/tmp_zfs$ sudo zpool status tmp_zfs
pool: tmp_zfs
state: ONLINE
scan: resilvered 696K in 00:00:00 with 0 errors on Tue Oct 29 08:54:17 2024
config:
NAME STATE READ WRITE CKSUM
tmp_zfs ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/tmp_zfs/0.raw ONLINE 0 0 0
/home/user/tmp_zfs/1.raw ONLINE 0 0 0
/home/user/tmp_zfs/2.raw ONLINE 0 0 0
errors: No known data errors
Note that -f
is required, otherwise you get an error:
invalid vdev specification
use '-f' to override the following errors:
mismatched replication level: pool uses mirror and new vdev is file
4.1.2 then attach the mirror drive
$ sudo zpool attach tmp_zfs "$(pwd)/2.raw" "$(pwd)/3.raw"
$ sudo zpool status tmp_zfs
pool: tmp_zfs
state: ONLINE
scan: resilvered 96K in 00:00:00 with 0 errors on Tue Oct 29 08:55:45 2024
config:
NAME STATE READ WRITE CKSUM
tmp_zfs ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/tmp_zfs/0.raw ONLINE 0 0 0
/home/user/tmp_zfs/1.raw ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/home/user/tmp_zfs/2.raw ONLINE 0 0 0
/home/user/tmp_zfs/3.raw ONLINE 0 0 0
errors: No known data errors
Note how the vdev mirror-1
has appeared.
4.1.3 detach the mirror drive
It’s also possible to remove the redundancy from the mirror.
$ sudo zpool detach tmp_zfs "$(pwd)/3.raw"
$ sudo zpool status tmp_zfs
pool: tmp_zfs
state: ONLINE
scan: resilvered 96K in 00:00:00 with 0 errors on Tue Oct 29 08:55:45 2024
config:
NAME STATE READ WRITE CKSUM
tmp_zfs ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/tmp_zfs/0.raw ONLINE 0 0 0
/home/user/tmp_zfs/1.raw ONLINE 0 0 0
/home/user/tmp_zfs/2.raw ONLINE 0 0 0
errors: No known data errors
4.1.4 remove the extension
And in some circumstances it’s also possible to make the pool smaller again - assuming all the data can fit onto the remaining space, and other assumptions.
$ sudo zpool remove tmp_zfs "$(pwd)/2.raw"
$ sudo zpool status tmp_zfs
pool: tmp_zfs
state: ONLINE
scan: resilvered 96K in 00:00:00 with 0 errors on Tue Oct 29 08:55:45 2024
remove: Removal of vdev 1 copied 108K in 0h0m, completed on Tue Oct 29 11:46:06 2024
96 memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
tmp_zfs ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/tmp_zfs/0.raw ONLINE 0 0 0
/home/user/tmp_zfs/1.raw ONLINE 0 0 0
errors: No known data errors
It might take some time to copy the data from the removed vdev to the remaining vdevs. Monitor the progress with the zpool status
command. Don’t just yank the removed drive.
4.2 add a mirror pair of drives in one go
$ sudo zpool add tmp_zfs mirror "$(pwd)/2.raw" "$(pwd)/3.raw"
$ sudo zpool status tmp_zfs
pool: tmp_zfs
state: ONLINE
scan: resilvered 96K in 00:00:00 with 0 errors on Tue Oct 29 08:55:45 2024
config:
NAME STATE READ WRITE CKSUM
tmp_zfs ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/tmp_zfs/0.raw ONLINE 0 0 0
/home/user/tmp_zfs/1.raw ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
/home/user/tmp_zfs/2.raw ONLINE 0 0 0
/home/user/tmp_zfs/3.raw ONLINE 0 0 0
errors: No known data errors
Note how the vdev mirror-2
has appeared.
5. remove the pool
$ sudo zpool destroy tmp_zfs
$ sudo zpool status tmp_zfs
cannot open 'tmp_zfs': no such pool
Snapshots and backups
$ sudo zfs snapshot zpoolmedia/media@20241029T163330
$ zfs list -t snapshot zpoolmedia/media
NAME USED AVAIL REFER MOUNTPOINT
zpoolmedia/media@20241029T163330 0B - 7.19T -
Create a pool on a removable disk with some datasets
Note that creating pools on USB attached disks is not recommended, due to the possability of USB disks becoming detached unexpectedly. But sometimes needs must, and it might be an okay plan for a backup disk for home.
Find the disk you are after using in the list of disks attached to the host. Best practice is to use the disk ID, because it is consistent across reboots and re-plugs.
ls -l /dev/disk/by-id/
Creating a pool automatically creates a root dataset on that pool. Best practice is to not use the root dataset, therefore set the mount point to the magic value of none
so that it will not be mounted.
zpool create -m none -f usbwd14t usb-WD_Elements_25A3_394D48423242364A-0:0
Now make a non-root dataset and mount it onto a directory on the filesystem.
zfs create usbwd14t/old_disks
mkdir /export
zfs set mountpoint=/export/old_disks usbwd14t/old_disks
Or alternatively
mkdir /export
zfs create -o mountpoint=/export/old_disks usbwd14t/old_disks
Other datasets can be created at the root level or within the dataset just created. For instance, I’ll create a dataset within the ‘old_disks` dataset and rsync an old disk’s contents into it.
zfs create usbwd14t/old_disks/segate_500G
rsync -var /mnt/sdc1 /export/old_disks/segate_500G/
Transfer the pool to another host
To properly unmount a zfs disk you need to un-attach the pool from the current host. This is done with the zpool export
command.
zpool export usbwd14t
This zfs disk can now be connected to another host and the pool can be attached to the new host. This is done with the zpool import
command.
zpool import usbwd14t
Note that the directories that the datasets are set to mount on must either exist already on the new host, or be creatable by the ZFS drivers. For the directories to be creatable means the filesystem cannot be read-only. See the section below if you need to create a root level directory on MacoOS.
Creating root level directories on MacOS
Since around 2020 the root partition on MacOS has been read-only.
Since I set the mount points for the datasets in the pool to be within the /export
directory on Linux, I’m going to need to create that on MacOS. Alternate solutions might be to use the -R
option to set the alternate root for the pool, or use the -N
option to not automatically mount the datasets in the pool.
If you need to create a new directory on the root partition then this is the procedure.
- make a new directory in a writeable area of the filesystem. /System/Volumes/Data/ is a good place:
mkdir /System/Volumes/Data/export
- create a new entry in
/etc/synthetic.conf
(if you have to create the file it should be owned byroot
, groupwheel
, and permissions0644
)
touch /etc/synthetic.conf
chown root:wheel /etc/synthetic.conf
chmod 0644 /etc/synthetic.conf
printf "export\tSystem/Volumes/Data/export\n" >> /etc/synthetic.conf
- get the system to read
/etc/synthetic.conf
and create the folder as a symbolic link to the actual folder created in step 1.
The APFS utility tool apfs.util
should be able to do that for you.
apfs.util <option>
-t : stitches and creates synthetic objects on root volume group.
So we can use that option.
/System/Library/Filesystems/apfs.fs/Contents/Resources/apfs.util -t
Or reboot the MacOS machine.