[issue1556] many processes stuck in "hmrrcm", system unusable

Steve O'Hara-Smith steve at sohara.org
Mon Oct 5 03:24:12 PDT 2009


On Mon, 05 Oct 2009 11:33:47 +0200
"Simon 'corecode' Schubert" <corecode at fs.ei.tum.de> wrote:

> I AM NOT TALKING ABOUT THROUGHPUT!  This is all about latency.  Opening 
> xterm and waiting 5 or 10 seconds for the prompt is not acceptable.

	That is certainly true, although I can slug down just about any
system to the point where that happens if I pile on enough load.

> So to anybody who "can not reproduce" this issue:  stop being such 
> obnoxious smartasses.

	I for one am not trying to be an obnoxious smartass. I am observing
that my experience and yours differ - perhaps due to your load pattern,
perhaps due to other factors. Let's find out which.

> There ARE serious problems, and anybody who is 
> just trying to reproduce this will notice issues at once.

	No they will not - I have not been able to reproduce these issues.
I can certainly make my system sluggish but nowhere near as bad as you are
reporting. I could keep on adding load until it happens but by then it'll
probably be because the system is swap thrashing.

> I think it's as simple as that:  If you don't see problems, you're not 
> qualified to talk.

	It is not as simple as that - there is something in common between
your experience and Hasso's that is not present in my experience - even
when I try to stress the system with random file IO - which I am doing as I
type this. Negative evidence is useful.

	I do know from experience that getting to the bottom of problems
like this on high load systems can be difficult and can produce surprising
results.

	Hasso may have hit the nail on the head with the combination of
AHCI and HAMMER - an indication of this is that I do not see the problem
and I do not have AHCI. This possibility could do with confirmation (which
I cannot do) - are you using AHCI ? If so can you try reverting to NATA and
seeing if it makes the problem go away ? Might you and Hasso be using
similar hardware and exposing a hardware/driver problem/limitation ?

> The best featureful file system is worth nothing if nobody can use it 
> because it just performs badly.  Now is the time to address this before 
> we complicate the system even more and make it even harder to diagnose 
> and fix.

	Significant point - it does not perform badly for everybody,
pinning down that difference is likely to go a long way towards identifying
the problem. For that those of us who do not suffer the problem can
usefully help - for example if you were to say "run this script and watch
it fall down around your ears" I would be happy to do so and report on
whether or not it did indeed fall down around my ears.

	It would be a great shame for Matt to spend several months
reworking the background flusher only to find that it wasn't causing the
problems in the first place, even more so if it didn't cure them.

-- 
Steve O'Hara-Smith                          |   Directable Mirror Arrays
C:>WIN                                      | A better way to focus the sun
The computer obeys and wins.                |    licences available see
You lose and Bill collects.                 |    http://www.sohara.org/





More information about the Bugs mailing list