Three more items brought into -release (fix for paging-to-swap bug, ahci polling, improved read/write fairness for hard drives)

Matthew Dillon dillon at backplane.com
Fri Jun 28 18:56:09 PDT 2019


The release branch's kernel has two more bug fixes and a new
read/write fairness feature for (primarily) hard drive accesses.

The first bug fix deals with a paging issue.  A machine which pages
heavily to swap can wind up looping on pages in the inactive queue,
writing the same pages out over and over again without making
progress.   This can eventually make a machine unusable.  This bug has
been fixed.

The second item is a mitigation to a possible AHCI chipset bug, which
is how most SATA drives are attached.   The mitigation is to add a
poll every 10 seconds just in case the chipset misses an interrupt
somehow.  We've had a number of reports of sata drives deadlocking for
no reason.  This mitigation is an attempt to narrow down the problem.

The third item is a modification to the 'da*' disk device attachment
which balances read and write I/O when both operations are present.
Hard drives have large write buffers and even though the driver makes
sure that both reads and writes get tags, a hard drive can wind up
starving read requests due to its write buffer filling up.  A single
tag is all that is needed to fill up a hard drive's write buffer.  The
new feature detects this situation and ensures that read TPS is given
a fair shake by temporarily blocking writes.

This last item significantly improves concurrent reads and writes to a
hard drive, particularly one used for swap (NOTE: we recommend only
using SSDs for swap, but yes, some people still use hard drives for
swap).  It may also avoid a possible read starvation issue caused by
the hard drive itself that could cause a read command tag to not get
processed in a reasonable period of time, long enough to potentially
cause a CMD TIMEOUT by the driver.

--

We are tracking several bug reports related to "indefinite wait
buffer" messages on the console, typically related to heavy paging
to/from swap.  The AHCI polling mitigation and the TPS balancing
feature are an attempt to narrow down the possible problem source and
possibly even fix the issue.

-Matt


More information about the Users mailing list