»
S
I
D
E
B
A
R
«
Analyzing iSCSI Performance (ZFS Intnet Log) - Part II
Apr 5th, 2009 by admin

Disabling ZIL (Intent Log)

Before actually investing in some kind of Solid State Device, it would be nice to actually quantify how much improvement in write performance is possible. That is if I separate the ZIL to a dedicated, fast, transactional device.

A quick and clean test would be to run a similar IOzone benchmark on the iSCSI device and turning off the Intent Log completely from the ZFS file system. This should give me the extreme best case as it would take away the entire overhead of ZIL.

Note: Keeping the ZIL disabled is not at all recommended in any production system as it could severely damage the file system. And the file system is prone to corruption in the event of any crash.

In order to turn off the ZIL dynamically the following command can be executed. I personally feel it’s a nice idea to restart the machine after making such change, especially if I am benchmarking. But I am too lazy so I followed the following steps:

Step I: Disable the ZFS Intent Log

# echo zil_disable/W0t1 | mdb -kw

Step II: Unmount the ZFS file system in the concerned pool

# zfs unmount /tank

Step III: Export the zpool

# zpool export tank

Step IV: Import the zpool back, which would also mount all the file systems

# zpool import tank

Step V: Execute the benchmarking applications

# iozone -ac -s8G -i0 -i1 -y4k -q1M


Results

The results were very interesting, as thought there was a big improvement in writes. But I didn’t expect the performance to increase by 26x on average. The following graph shows the numbers that I got after turning off the ZIL.

04052009-iscsi_performance_read_and_write_with_zil_disabled

Also notice that the Read performance also increased big time, ~110MB/s which is an improvement of around 12%. But it’s hard to say that the bottleneck is the back-end (solaris/zfs/iscsi) or the network itself since the performance is near-wire speed. The results from the perevious tests are mentioned in this entry

Summary

Summarizing all the pervious tests the following graphs shows the improvement I achieved with the two changes:

04052009-comparing_iscsi_write_performance_8k_to_128k_fs_block
04052009-comparing_iscsi_read_performance_8k_to_128k_fs_block

  1. Moving the file system block size from 8k to 128k
  2. Running the tests with ZFS intent log disabled (case for SSD)

There is still a lot to work on getting a perfect working system. The next step would be to insert a USB Flash Disk and see where does the performance lands (esp. on writes)

Analyzing iSCSI on ZFS (Playing With ZIL)
Apr 4th, 2009 by admin

In the previous entry, I showed some performance numbers that I was able to achieve from my ZFS backed iSCSI targets. The read performance that I was able to achieve were great, unfortunately I was only able to touch max around 1MB/s on writes.

After much discussions, I was inclining towards the theory that the culprit is the high latency and low IO performance of my 5 x SATA-II RAID-Z, which is shared between the data, metadata and the ZFS Intent Log. ZFS Intent Log is the logging system for ZFS file system, responsible for maintaining the file system consistent in the event of crashes etc. ZIL is also used by the file system for providing synchronous writes, for POSIX compliancy. For a good performing system it is recommended that the device that hold’s the log is very fast (in terms of latency for handling many small random writes). Neelakanth’s blog entry has some very good info on ZIL.

Playing with ZFS Intent Log (ZIL)

So as next step, I will try to identify and more importantly quantify the kind of IO operations that are taking place during writes.

Typically Solid State Disks (SSD) with high transactional performance are recommended to be used for the ZFS Intent Logs. The only problem is that the cost of buying an expensive SSD disk would be prohibitively expensive esp. for my home brew NAS.

I still want to not only see how much a SSD device can really help (which is an easy part), but also to optimize the performance of my iSCSI targets for write performance.

In order to have fast Apple Time Machine backups I would need faster iSCSI devices. This would get better once Apple moves to ZFS as their native file system and the backup would truly mean sending the incremental snapshots using “zfs send”.

So I started by monitoring the activity on the logging subsystem using “dtrace”. Dtrace is one of the most exposing tool for realtime analysis, that I have come across.

So I planned a small experiment, where I would write (and re-write) a 128MB file with 256k block size using IOzone on the iSCSI device. And during this time would monitor all the activities on the ZIL. This requires setting two probes on the Solaris workstation before starting the IOzone test.

Setting Probes on OpenSolaris

Probe 1: To collect information on all the ZIL activity.

# dtrace -n zil\*:entry'{@[probefunc]=count();}' -n zio_free_blk:entry'{@[probefunc]=count();}' 
dtrace: description 'zil*:entry' matched 44 probes
dtrace: description 'zio_free_blk:entry' matched 1 probe
^C

  zil_itx_clean                                                     4
  zil_clean                                                        16
  zil_header_in_syncing_context                                    82
  zil_sync                                                         82
  zil_vdev_compare                                               1152
  zil_flush_vdevs                                               31617
  zil_commit_writer                                             32457
  zil_lwb_write_done                                            32764
  zil_lwb_write_start                                           32764
  zio_free_blk                                                  32764
  zil_add_block                                                 32769
  zil_commit                                                    32933
  zil_lwb_commit                                                33106
  zil_itx_assign                                                33585
  zil_itx_create                                                33585
  zil_lwb_write_init                                            34253

Probe 2: To monitor block size of the log writes.

# dtrace -n ::zil_lwb_commit:entry'{@[1] = quantize((lwb_t*)args[2]->lwb_sz);}'
dtrace: description '::zil_lwb_commit:entry' matched 1 probe
^C

        1
           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |                                         58
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   31202
           16384 |@@                                       1643
           32768 |                                         203
           65536 |                                         0        

Note: Notice the ^C character right below the probe; Once the IOzone run is finished the probe must be interrupted for getting the results.

04042009-iscsi-blog-2

Executing the IOzone test on the client: The following test would basically write 256MB of data to the iSCSI device.

$ ~/Downloads/iozone -r256k -s128m -i0 -w 
     Iozone: Performance Test of File I/O
                Version $Revision: 3.321 $
                Compiled for 32 bit mode.
                Build: macosx
         .
         .
         .
        Record Size 256 KB
        File size set to 131072 KB
        Setting no_unlink
        Command line used: /Users/abisen/Downloads/iozone -r256k -s128m -i0 -w
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.

              KB  reclen   write rewrite
          131072     256     872     869   

iozone test complete.

I still need to completely understand the output from the two ZFS probes, but it is clearly visible that for writing 256MB of data it requires a lot of activity on the logging subsystem. Now that we have quantified the number of commits and block size of commits on the ZIL. It would be very helpful to somehow see how our storage subsystem is performing from IOPS and latency perspective.

I performed a quick test again, using the same IOzone benchmarking tool and iostat. But this time I executed the benchmark locally on the Solaris workstation. Alternatively I also created a small ZFS file system on an old SanDisk Cruzr (512MB) USB Flash Disk. And ran the same test on this file system as well. Just to get an idea on how a USB Flash would fare against my 5 x SATA-II RAID-Z volume.

The video below shows what I noticed, Not only the service time (latency) on the USB Flash Disk was 4-5 times better than 5 x SATA HDD’s, even the IOPS that the old little USB flash could sustain were far greater than what 5 SATA disks could achieve.

So a USB Flash Disk might have some potential for helping in improving the write performance of a home brew NAS. I am hesitant on trying this out right away because, it’s easy to add a new device to a ZFS file system, but I have not been able to find any information on deleting/removing a device from a Zpool.

Next steps:

  • Testing the performance by disabling the ZIL completely. This would give an idea on how fast can these iSCSI device get if ZIL is taken out of the picture. A ZFS file system should never be used with ZIL disabled, but it’s a neat feature for testing if separating ZIL would give any performance benefits.
  • Testing the performance of the ZFS file system by using a USB based Flash Disk.
»  Substance: WordPress   »  Style: Ahren Ahimsa