three kernel patches for review

Raphael Marmier raphael at marmier.net
Wed Apr 20 22:56:18 PDT 2005


That's interesting, but how do you know the location of the bad 
addresses? Run memtest at boot?
Run memtest manually and export a table of bad addresses to a file?

As we are pushing further in the product life, more and more bits are 
going to break. How do we handle that?

That said, I like the idea of being able to survive with bad ram for a 
while.

Raphael Marmier

Chris Pressey wrote:
On Tue, 19 Apr 2005 18:48:02 -0700 (PDT)
Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:

[...]
   Otherwise, the ANSIfication patch generally looks good, go ahead
   and commit #1 after you clean up that comment.
   On #2 ... looks reasonable.  Commit away!


OK, committed!


   On #3 ... that doesn't look so reasonable.  I suppose on a machine
   with huge amounts of memory one might want such a mechanism, but
   frankly if memory is bad (especially if it is ECC'd memory), the
   only correct solution is to replace it.  


I'm going to call you on that one, Matt - _why_ do you say that is the
only correct solution?
My understanding of the service curve of RAM is that it is not like that
of disks.  Entropy does affect RAM, but at a much longer time-scale, so
the first few bad bits you see are much more likely to be flukes than an
indication that the RAM stick is reaching the end of its useful life.
Also, the conventional wisdom that the thing you should do when you have
bad bits in a stick of RAM is to replace the entire stick, sounds like
it stems from the fact that the OS has no way of remapping those bad
bits (like it has with a disk.)  Of course, with this patch, that fact
would no longer be a fact, and that wisdom wouldn't hold water anymore.
On Wed, 20 Apr 2005 10:24:37 +0200
Joerg Sonnenberger <joerg at xxxxxxxxxxxxxxxxx> wrote:

I'm also split on the badram patch. I have some RAM modules which have
static bad bits, so they could be used with the bad ram patch. On the
other hand, such modules should be replaced and burned :)


Same question to you, Joerg - _why_ should they be replaced and burned?

When I consider the sheer amount of resources that go into manufacturing
a stick, and that there are typically millions of still-good bits that
could still be put to use on a "bad" one, I'd consider it a rotten shame
to just throw it out.
The Linux BadRAM project's website also lists some sound motivations,
including a commercial one:
  http://rick.vanrein.org/linux/badram/

Anyway, that's my case for including this patch.  If you still don't
think it should go in, I won't say anything further, but please do at
least consider the reasons I've given.
-Chris





More information about the Submit mailing list