three kernel patches for review
Raphael Marmier
raphael at marmier.net
Wed Apr 20 22:56:18 PDT 2005
That's interesting, but how do you know the location of the bad
addresses? Run memtest at boot?
Run memtest manually and export a table of bad addresses to a file?
As we are pushing further in the product life, more and more bits are
going to break. How do we handle that?
That said, I like the idea of being able to survive with bad ram for a
while.
Raphael Marmier
Chris Pressey wrote:
On Tue, 19 Apr 2005 18:48:02 -0700 (PDT)
Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:
[...]
Otherwise, the ANSIfication patch generally looks good, go ahead
and commit #1 after you clean up that comment.
On #2 ... looks reasonable. Commit away!
OK, committed!
On #3 ... that doesn't look so reasonable. I suppose on a machine
with huge amounts of memory one might want such a mechanism, but
frankly if memory is bad (especially if it is ECC'd memory), the
only correct solution is to replace it.
I'm going to call you on that one, Matt - _why_ do you say that is the
only correct solution?
My understanding of the service curve of RAM is that it is not like that
of disks. Entropy does affect RAM, but at a much longer time-scale, so
the first few bad bits you see are much more likely to be flukes than an
indication that the RAM stick is reaching the end of its useful life.
Also, the conventional wisdom that the thing you should do when you have
bad bits in a stick of RAM is to replace the entire stick, sounds like
it stems from the fact that the OS has no way of remapping those bad
bits (like it has with a disk.) Of course, with this patch, that fact
would no longer be a fact, and that wisdom wouldn't hold water anymore.
On Wed, 20 Apr 2005 10:24:37 +0200
Joerg Sonnenberger <joerg at xxxxxxxxxxxxxxxxx> wrote:
I'm also split on the badram patch. I have some RAM modules which have
static bad bits, so they could be used with the bad ram patch. On the
other hand, such modules should be replaced and burned :)
Same question to you, Joerg - _why_ should they be replaced and burned?
When I consider the sheer amount of resources that go into manufacturing
a stick, and that there are typically millions of still-good bits that
could still be put to use on a "bad" one, I'd consider it a rotten shame
to just throw it out.
The Linux BadRAM project's website also lists some sound motivations,
including a commercial one:
http://rick.vanrein.org/linux/badram/
Anyway, that's my case for including this patch. If you still don't
think it should go in, I won't say anything further, but please do at
least consider the reasons I've given.
-Chris
More information about the Submit
mailing list