Stability status update 22 Jul 2005
Matthew Dillon
dillon at apollo.backplane.com
Fri Jul 22 09:32:39 PDT 2005
Hello everyone! Just in case people have been wondering about the
recent commits, four of us (Peter, David, Tomaz, YONETANI Tomokazu)
have been making a hard push to stabilize the system on SMP boxes.
Peter, David, and I have been focusing on an SMP crash that has plagued
us for over two months, even longer in fact, because we have had a hard
time getting core dumps out of the SMP systems involved. Tomaz and
YONETANI Tomokazu and I have been working on issues with the IPS driver.
Most of the commits in the last two days have been related to making
SMP systems behave better during panics. There have been many commits
in the last few weeks to add KTR logging to critical subsystems and
this has improved *invaluable* in helping us track down the bugs.
In the last week we have made significant progress on the SMP
crashes. There turned out to be three significant bugs: the TCP
sockbuf issues, bugs in the LWKT token code, and a nasty bug in the
LWKT IPIQ (IPI messaging) code.
The token code has been fixed. The TCP code mbuf/sockbuf code is going
to get a patch commit tonight for stabilization purposes and will then
be reworked to clean it up. And, just yesterday, I believe I have
*finally* found the smoking gun related to the IPIQ crashes. It looks
like an index comparison bug was resulting in old IPIQ message entries
being re-executed on the target cpu. You can imagine the absolute
havoc that this would cause on a system(!).
It takes a good week's worth of testing to detect that particular bug
because it can only occur in certain heavily-loaded cases when the
IPIQ's software FIFO fills up, so we won't know if we've nailed it for
sure for a few days. However, I will be committing the fix for it
tonight anyway.
This isn't to say that the kernel is bug-free, and in fact Tomaz just
located another (hopefully unrelated) bug. But once these main line
items are fixed I believe we will be well positioned to move forward
with new work. The kernel is far better instrumented now then it was
a month ago, that's for sure!
If all goes well the 'preview' tag will also be moved early next week.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Kernel
mailing list