Interrupt recursion smashes kernel memory

Matthew Dillon dillon at apollo.backplane.com
Sun Jan 13 11:56:01 PST 2008


:Hey,
:
:in the process of getting the nvidia driver to work, I'm experiencing kernel memory corruption.  This is seemingly due to interrupts filling some kernel thread's stack and then overwriting kernel memory once the stack overflows.
:
:Now I have two questions:
:
:1. how can this recursion happen?  Shouldn't interrupts be disabled once the interrupt arrives and only be re-enabled when we leave the interrupt frame?

    It can't.  Only a FAST interrupt (of which there are almost none) 
    actually runs in the current kernel stack frame, and even FAST interrupts
    are masked and only reenabled upon completion.  All other interrupts
    are threaded... the physical interrupt does nothing more then schedule
    the appropriate interrupt thread.

:2. do you think we should change the segment descriptor to throw some sort of exception so that we will get a feedback immediately instead of waiting until some operation works on corrupted memory and panics?

    It can't be done with a segment descriptor but it is possible to put
    a guard page on the kernel stack.  A guard page is a page marked as 
    invalid that forces a double fault to occur if the current thread's
    kernel stack is exhausted.  Double faults can be caught, but they
    are fairly difficult to debug.

:3. I suspect this recursion is due to an interrupt storm.  How does this usually happen?  I know that I am working on a blob and hardware without specs, which makes reasoning even more complicated, but I'm interested in the general case.

    Unlikely.  An interrupt storm should not be able to stack recursively.

    The kernel stack is rather small.  I think it's only 8K or 12K.  It is
    possible that the nvidia driver is exhausting it just with its normal
    operation.

:I seem to remember that I suffered from something like this before, but I can't remember when and why this was and how it got fixed.
:
:Checking the return addresses, most frames have return addresses of:
:
:0xc028fc90 <doreti+0>:  pop    %eax
:0xc028fc91 <doreti+1>:  mov    $0x0,%eax
:0xc028fc9d <doreti+13>: cli    
:
:or
:
:0xc029774f <Xicu_slowintr11+143>:       jmp    0xc028fc90 <doreti>
:
:For some reason however the interrupt flag seems to be set in all trap frames.  For me, this leads to the conclusion that this recursion happens *after* processing the interrupt handler.  I can read the comment:
:
:2: ;                                                                    \
:        /* set running bit, clear pending bit, run handler */           \
:        andl    $~IRQ_LBIT(irq_num), PCPU(ipending) ;                   \
:        sti ;                                                           \
:        pushl   $irq_num ;                                              \
:        call    sched_ithd ;                                            \
:        addl    $4,%esp ;                                               \
:
:But I can't see where the running bit is set.  Actually, I have the feeling that there is no protection against re-entrancy for the period between sti and the end of doreti.  Wouldn't we want to bump some sort of "in interrupt processing" flag until we're in CLI protection again?  This way we would avoid this recursion.
:
:cheers
:  simon

    Reentrancy is protected.  The interrupt is masked when taken and only
    unmasked after the interrupt procedure has completed operation.  In
    the case of scheduled interrupts the interrupt is masked when the
    interrupt is taken and unmasked by the interrupt thread after it
    finishes processing it.

    sched_ithd() is responsible for setting the running bit.

    Is IRQ11 the video interrupt during your tests?  It kinda sounds like
    normal calls to the nvidia driver are causing the problem.

						-Matt






More information about the Kernel mailing list