Interrupt recursion smashes kernel memory

Simon 'corecode' Schubert corecode at fs.ei.tum.de
Sun Jan 13 09:07:46 PST 2008


Hey,

in the process of getting the nvidia driver to work, I'm experiencing kernel memory corruption.  This is seemingly due to interrupts filling some kernel thread's stack and then overwriting kernel memory once the stack overflows.

Now I have two questions:

1. how can this recursion happen?  Shouldn't interrupts be disabled once the interrupt arrives and only be re-enabled when we leave the interrupt frame?

2. do you think we should change the segment descriptor to throw some sort of exception so that we will get a feedback immediately instead of waiting until some operation works on corrupted memory and panics?

3. I suspect this recursion is due to an interrupt storm.  How does this usually happen?  I know that I am working on a blob and hardware without specs, which makes reasoning even more complicated, but I'm interested in the general case.

I seem to remember that I suffered from something like this before, but I can't remember when and why this was and how it got fixed.

Checking the return addresses, most frames have return addresses of:

0xc028fc90 <doreti+0>:  pop    %eax
0xc028fc91 <doreti+1>:  mov    $0x0,%eax
0xc028fc9d <doreti+13>: cli    

or

0xc029774f <Xicu_slowintr11+143>:       jmp    0xc028fc90 <doreti>

For some reason however the interrupt flag seems to be set in all trap frames.  For me, this leads to the conclusion that this recursion happens *after* processing the interrupt handler.  I can read the comment:

2: ;                                                                    \
       /* set running bit, clear pending bit, run handler */           \
       andl    $~IRQ_LBIT(irq_num), PCPU(ipending) ;                   \
       sti ;                                                           \
       pushl   $irq_num ;                                              \
       call    sched_ithd ;                                            \
       addl    $4,%esp ;                                               \
But I can't see where the running bit is set.  Actually, I have the feeling that there is no protection against re-entrancy for the period between sti and the end of doreti.  Wouldn't we want to bump some sort of "in interrupt processing" flag until we're in CLI protection again?  This way we would avoid this recursion.

cheers
 simon
--
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \




More information about the Kernel mailing list