In the previous entry, I showed some performance numbers that I was able to achieve from my ZFS backed iSCSI targets. The read performance that I was able to achieve were great, unfortunately I was only able to touch max around 1MB/s on writes.
After much discussions, I was inclining towards the theory that the culprit is the high latency and low IO performance of my 5 x SATA-II RAID-Z, which is shared between the data, metadata and the ZFS Intent Log. ZFS Intent Log is the logging system for ZFS file system, responsible for maintaining the file system consistent in the event of crashes etc. ZIL is also used by the file system for providing synchronous writes, for POSIX compliancy. For a good performing system it is recommended that the device that hold’s the log is very fast (in terms of latency for handling many small random writes). Neelakanth’s blog entry has some very good info on ZIL.
Playing with ZFS Intent Log (ZIL)
So as next step, I will try to identify and more importantly quantify the kind of IO operations that are taking place during writes.
Typically Solid State Disks (SSD) with high transactional performance are recommended to be used for the ZFS Intent Logs. The only problem is that the cost of buying an expensive SSD disk would be prohibitively expensive esp. for my home brew NAS.
I still want to not only see how much a SSD device can really help (which is an easy part), but also to optimize the performance of my iSCSI targets for write performance.
In order to have fast Apple Time Machine backups I would need faster iSCSI devices. This would get better once Apple moves to ZFS as their native file system and the backup would truly mean sending the incremental snapshots using “zfs send”.
So I started by monitoring the activity on the logging subsystem using “dtrace”. Dtrace is one of the most exposing tool for realtime analysis, that I have come across.
So I planned a small experiment, where I would write (and re-write) a 128MB file with 256k block size using IOzone on the iSCSI device. And during this time would monitor all the activities on the ZIL. This requires setting two probes on the Solaris workstation before starting the IOzone test.
Setting Probes on OpenSolaris
Probe 1: To collect information on all the ZIL activity.
# dtrace -n zil\*:entry'{@[probefunc]=count();}' -n zio_free_blk:entry'{@[probefunc]=count();}'
dtrace: description 'zil*:entry' matched 44 probes
dtrace: description 'zio_free_blk:entry' matched 1 probe
^C
zil_itx_clean 4
zil_clean 16
zil_header_in_syncing_context 82
zil_sync 82
zil_vdev_compare 1152
zil_flush_vdevs 31617
zil_commit_writer 32457
zil_lwb_write_done 32764
zil_lwb_write_start 32764
zio_free_blk 32764
zil_add_block 32769
zil_commit 32933
zil_lwb_commit 33106
zil_itx_assign 33585
zil_itx_create 33585
zil_lwb_write_init 34253
Probe 2: To monitor block size of the log writes.
# dtrace -n ::zil_lwb_commit:entry'{@[1] = quantize((lwb_t*)args[2]->lwb_sz);}'
dtrace: description '::zil_lwb_commit:entry' matched 1 probe
^C
1
value ------------- Distribution ------------- count
2048 | 0
4096 | 58
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 31202
16384 |@@ 1643
32768 | 203
65536 | 0
Note: Notice the ^C character right below the probe; Once the IOzone run is finished the probe must be interrupted for getting the results.

Executing the IOzone test on the client: The following test would basically write 256MB of data to the iSCSI device.
$ ~/Downloads/iozone -r256k -s128m -i0 -w
Iozone: Performance Test of File I/O
Version $Revision: 3.321 $
Compiled for 32 bit mode.
Build: macosx
.
.
.
Record Size 256 KB
File size set to 131072 KB
Setting no_unlink
Command line used: /Users/abisen/Downloads/iozone -r256k -s128m -i0 -w
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
KB reclen write rewrite
131072 256 872 869
iozone test complete.
I still need to completely understand the output from the two ZFS probes, but it is clearly visible that for writing 256MB of data it requires a lot of activity on the logging subsystem. Now that we have quantified the number of commits and block size of commits on the ZIL. It would be very helpful to somehow see how our storage subsystem is performing from IOPS and latency perspective.
I performed a quick test again, using the same IOzone benchmarking tool and iostat. But this time I executed the benchmark locally on the Solaris workstation. Alternatively I also created a small ZFS file system on an old SanDisk Cruzr (512MB) USB Flash Disk. And ran the same test on this file system as well. Just to get an idea on how a USB Flash would fare against my 5 x SATA-II RAID-Z volume.
The video below shows what I noticed, Not only the service time (latency) on the USB Flash Disk was 4-5 times better than 5 x SATA HDD’s, even the IOPS that the old little USB flash could sustain were far greater than what 5 SATA disks could achieve.
So a USB Flash Disk might have some potential for helping in improving the write performance of a home brew NAS. I am hesitant on trying this out right away because, it’s easy to add a new device to a ZFS file system, but I have not been able to find any information on deleting/removing a device from a Zpool.
Next steps:
- Testing the performance by disabling the ZIL completely. This would give an idea on how fast can these iSCSI device get if ZIL is taken out of the picture. A ZFS file system should never be used with ZIL disabled, but it’s a neat feature for testing if separating ZIL would give any performance benefits.
- Testing the performance of the ZFS file system by using a USB based Flash Disk.