Preliminary TRIM support

Tim Bisson bissont at mac.com
Mon May 2 00:37:53 PDT 2011


Hi All,

I wanted to send an update saying that there is preliminary support for TRIM. I'm hoping to get some feedback on the design and implementation. I do apologize for the long e-mail.

Here's what I've done:
-nata/ata-disk.c can supports TRIM commands.
-UFS has support for trim:
	*newfs -E /dev/device will erase all the blocks for that device using the TRIM command (can be used a general tool to TRIM the device...)
	*newfs -R /dev/device will set a TRIM flag so that blk frees will execute TRIM commands

Here's the code:
https://github.com/bissont/DragonFlyBSD/commit/03b9669852c02de9dc9bb6ed38e1d76362019d67

Description of the file changes by file:
https://github.com/bissont/DragonFlyBSD/wiki/Changes

To validate performance, I used iot.c:
http://www.corpit.ru/mjt/iot.c

And here's the basic test with an OCZ Vertex 2 (http://pastebin.com/10E5ytm9):
1. Clean FS ( newfs -E /dev/ad4)
2. Dirty device for 15 minutes w/ 4k random writes
3. Average sequential Read Perf (125 MB/S) <-----

4. Clean FS ( newfs -E /dev/ad4)
5. Average sequential Read Perf (231 MB/S) <-----


---------------------
High Level logic (largely pulled from FreeBSD implementation):
http://lists.freebsd.org/pipermail/svn-src-head/2009-December/012975.html
http://lists.freebsd.org/pipermail/svn-src-all/2010-December/033560.html
http://lists.freebsd.org/pipermail/svn-src-all/2010-December/033562.html

Support for TRIM commands has been added to nata/ata-disk.c, nata/ata-queue.c.

SSD support trim in the form of a trim command with the argument being a list of lba ranges, which are 8 bytes and passed to the device in a 512 byte block. So for instance, an OCZ vertex 2 device supports a 1 block TRIM command so 64 lba ranges can be passed to it. An LBA range contains lba (first 6 bytes) followed by a 2 bytes  for the number of lbas.

In ad_strategy() we add support for the BUF_CMD_FREEBLK command. This command ends up generating a TRIM request. We have created a separate queue for TRIM requests so that we can coalesce them quickly

In ata_start(), we first check if the trim command is running and there is a trim request on the trim queue. If it is running we process a normal request (only one trim command can be running on a SSD). If the trim command isn't running and there are trim requests queued, we generate an actual trim request. We try to pack as many trim requests into a single trim request as possible because the actual trim execution time is on the order of 10s to 100s of milliseconds.  Once we've generated the a
ctual command, we put the request on the ata_queue() and let it process the request as a normal request on the ata-queue.

I implemented the code this way because the alternative seems worse: add trim requests to the ata_queue like normal. When we encounter a trim request to run (in ata_start()), we scan the rest of the ata queue in search of additional trim requests to coalesce with the current one. There is probably a better way queue trim requests.

In ad_done(), we process TRIM requests separately because we can potentially have coalesced struct requests that need to be freed.


I created an IOCTL (IOCATADELETE) which takes as input:
off_t[2] range;

It specifies a start lba and number of lbas to trim. It creates a new buf with the command BUF_CMD_FREEBLK. This is used by "newfs  -E" to tell the device which blocks to trim – this is also how FreeBSD does it in newfs, in the form of erfs().  This is one place I'm not sure I have right in terms of creating a bp-based command, but it's basically lifted from other parts of the tree:
 bp = getnewbuf(0,0,0,1);

 bp->b_cmd = BUF_CMD_FREEBLKS;
 bp->b_bio1.bio_offset =  block_start*512;
 bp->b_bcount = count*512;
 bp->b_bio1.bio_flags |= BIO_SYNC;
 bp->b_bio1.bio_done = biodone_sync;
 dev_dstrategy(ap->a_head.a_dev, &bp->b_bio1);

 if (biowait(&bp->b_bio1, "TRIM")) {
   kprintf("Error:%d\n", bp->b_error);
   return(bp->b_error ? bp->b_error : EIO);
}
brelse(bp);


If we pass -R to newfs, we end up setting a file system flag that says this file system and the underlying device supports trim. We verify the flag at mount time and that the device actually supports TRIM.

We modified ffs_blkfree() to check if the fs supports trim. If it does, we issue TRIM command for the corresponding LBAs before clearing the bitmaps.

This is another potential problem. It's currently implemented as a synchronous call like the bp command above. Because it's synchronous it bogs down delete performance as a delete operation now takes milliseconds. I tried to make the call asynchronous by returning from this function and then specifying a callback which would finally call ffs_freeblk_cg(), but when I did that, the pointers were invalid in ffs_freeblk_cg() (for instance the inodep). I'll probably need to reinvestigate this and ma
ke it work as performance really bottlenecks on deletes.

---------------------

Aborted TRIM commands.

If I issue back-to-back trim requests, for instance through an ioctl (http://pastebin.com/C1jAPMQL) or multipe deletes on a UFS , I get the following:

ad4: FAILURE - DATA_SET_MANAGEMENT status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=4194256

I've tried inserting ata_udelay()s but that hasn't seemed to help. I've verified that there is only one outstanding trim request. I'm going to get a intel ssd 320 to verify the behavior on another device. Any suggestions regarding this issue are much appreciated.

---------------------

What's next:
*Hammer TRIM  support?
*Swapcache TRIM support?
*Swap TRIM support?
*An actual TRIM utility for devices, not newfs -E masquerading as one...
*Fix ABORT bugs
*Asynchronous ext free trim commands?

Tim








More information about the Kernel mailing list