tom callaway (spot) wrote,
tom callaway
spot

Tuning Fedora for SSDs

A while back, I found myself reading a generic PC magazine in a waiting room, and they had an article on Tuning SSDs for Windows 7. It made me wonder what the Fedora equivalent would be. So, I asked my friend and Red Hat coworker Jeff Moyer about it. Jeff knows more about SSDs in Linux than anyone else I know, he does most (if not all) of the testing of SSDs for Red Hat's storage team. He agreed to write a little article for the Fedora Community, so here it is! Feel free to leave comments here, and I will be sure to pass them along to Jeff.

*****

Tuning Fedora for SSDs
by Jeff Moyer

This article provides some basic background information on Solid State Disk devices (SSDs), and gives guidance on operating system tuning for
them.

SSDs are typically associated with good random I/O performance (especially reads) for small block sizes. Unfortunately, not all SSDs are created equal. The performance spectrum for SSDs is as wide as the performance difference between a 5200rpm disk drive and a high-end RAID array. In addition, SSDs do not all implement the same feature set. Most important to this discussion are the Native Command Queuing (NCQ) and TRIM features, explained below.

A device implementing NCQ can accept multiple commands at a time. This means that the operating system can issue multiple I/O requests to the device, up to some maximum number, without waiting for previous commands to complete. The device can then perform some tricks to increase the overall I/O performance. Empirically, SSDs that lack NCQ support do not perform nearly as well as those which include this support.

TRIM (or discard, in Linux terminology) is a command that the operating system can send down to the SSD to tell the disk that a range of blocks is no longer in use. For example, if a file is removed from the file system, the file system driver can send a TRIM command to the device to mark the blocks as unused. The SSD can use
this information to free up internal erase blocks for wear leveling. In general, this is a good thing, as it allows the storage to continue remapping blocks, avoiding costly updates in place. However, the benefits of sending TRIM commands vary from device to device. On some devices, it is essential to send down TRIMs in order to get good performance from the drive, while on others, there is little or no advantage (and sometimes even a performance loss).

The downside of TRIM, in its current form, is that it is not a queued command. When a TRIM command is issued to an SSD, the operating system first has to wait for all currently outstanding commands (I/Os) to finish, then it can send the TRIM, and then it can go about sending more I/O to the device. Given the speeds of some of these devices, that can cause a big hiccup in performance.

Another technique SSD manufacturers use to provide good performance is over-provisioning. That is, SSDs are often shipped with some number of blocks that are not reported in the overall size of the device. This is key to maintaining good performance as the amount of data stored on the disk approaches the reported capacity of the disk. As you can imagine, the device firmware and the amount of over-provisioning play important roles in the overall performance of the device. Because there is limited visibility into some core pieces of SSDs that impact performance, I highly recommend reading reviews before purchasing a device.

Frequently Asked Questions

What is the erase block size for my SSD?
Good question! This information can be hard to find. It depends on the flash parts used in the SSD, which is not always advertised. You
really only need to know this for partitioning, and for that, you can use a number that should cover all SSDs. See below.

Do I need to align my partitions to the erase block size?
It is generally considered a good idea to align partitions to the erase block size of your SSD. If you do not know the erase block
size, simply align the first partition to 1MB, and the ends of your partitions to 1MB boundaries. If you don't do this, some SSDs will
perform poorly.

What file systems support discard?
btrfs, ext4, fat, gfs2, nilfs2

How do I enable discard support?
File systems supporting discard accept the 'discard' mount option. During mkfs, a discard is done on all of the free space if the device supports it (no need to pass in any options).

Should I turn on discard for my device?
This question can be answered by running the discard test kit written by Lukas Czerner (WARNING: it may brick your drive!).[1] It creates a metadata intensive workload and measures performance both with and without TRIM enabled. It also measures the performance of the disk as the number of blocks written increases without doing any TRIMming. In this manner you can see how the device degrades over time without TRIM. If your device does not degrade over time, then there is no need to enable TRIM (remember, enabling it at mount time means that you may suffer a performance penalty). Further, some device firmwares react very poorly to TRIM, in some instances rendering the device inoperable. If you want to be on the safe side, do not enable TRIM.

What about hdparm/wiper.sh?
hdparm[2] does have a mechanism to TRIM all of the free space for a file system (see wiper.sh). However, the hdparm author does not recommend its use without first backing up data to another device. It is worth noting that this tool allows the TRIMming of free space on some file systems which do not support discard natively. Its supported file system list includes:
online support: ext4, xfs
offline suport: ext2, ext3, and reiserfs

What tuning can I perform to make my disk go faster?
The default I/O scheduler (cfq) has optimizations built-in for SSDs. Thus, it should perform well for most users. It is worth noting, though, that not all devices advertise the proper rotational rate. You can check to see if your SSD advertises it correctly by doing the following:
# cat /sys/block/sdX/queue/rotational
replacing the 'X' with the letter of your device. If that reports a '1', then the drive is advertising itself as a rotational device, which is wrong for an SSD. You can echo a 0 into that sysfs file in order to change it, and this will enable the non-rotational media optimizations in the I/O scheduler.

In rare instances, the optimal I/O size for a device may be the erase block size. Typically, the larger the I/O size, the better performance you get from a device. However, I've seen exactly one case where sending I/O down larger than the erase block size actually resulted in decreased performance from the drive. In order to restrict the size of I/O sent down, you can echo a value to /sys/block/sdX/queue/max_sectors_kb. For example, if the performance drops off after I/Os sized 128KBs, you would do:
# echo 128 > /sys/block/sdX/queue/max_sectors_kb

Is it okay to put my swap device on an SSD?
Yes. SSDs are rated for a number of program/erase cycles that is extremely high. Plus, the sometimes random access patterns of swap can benefit greatly from being on an SSD.

Does swap support TRIM?
Yes. When enabling swap, the entire swap space is discarded.

I heard that indexing software can cause an untimely death of my SSD. Should I disable beagle, for example?
Again, NAND flash is rated for a set number of program/erase cycles. By constantly writing/overwriting/TRIMming sectors on the device, the lifetime of the device is shortened. Wear-leveling can help increase the longevity of the device, but that is controlled entirely by the drive firmware. As such, the impact of indexing software on SSDs will vary from device to device. In all cases, it will have some negative impact, but it's not clear the degree to which the software bundled with Fedora will impact the lifetime of an SSD.

Should I disable the journal on my file system?
No. Ted T'so did a nice write-up of the overhead of journaling in his blog.[3] It turns out that the journal does not generate enough I/O to be worried about in most cases.

Device-specific Recommendations*:

Intel SSDs: defaults
Sandisk G3: set /sys/block/sdX/queue/max_sectors_kb to 128
OCZ Vertex: defaults
WD SE Blue: use ext4 or btrfs with the discard mount option

* Recommendations are based on a broad range of testing performed
across several different workloads. For a specific workload, users
are encouraged to do their own testing.

Further reading:
Anandtech has a nice write-up on how SSDs work:
http://www.anandtech.com/show/2738

[1] http://sourceforge.net/projects/test-discard/
[2] http://hdparm.sourceforge.net/
[3] http://thunk.org/tytso/blog/2009/03/01/ssds-journaling-and-noatimerelatime/
Subscribe

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 1 comment