can't boot with kern.mmxopt=1

Matthew Dillon dillon at apollo.backplane.com
Mon May 3 14:30:30 PDT 2004


:
:How do I go about viewing the 'trace' that isn't tracing the handler for 
:ctrl-alt-esc?  
:
:Dave

    That's a bit more involved but if you play around with it you can 
    figure it out.

    Basically the 'trace' command takes an argument, which can be PID,
    or a thread pointer.  The 'ps' command will list all threads and 
    processes (there will be duplications because it lists the processes
    and then it lists all the threads, including those with processes
    attached to them).

    It is likely that the system is in the interrupt hook code waiting for
    interrupt driven hooks to complete.

    Ah!  Wait a minute.  I think I may have figured it out.  Please try
    this out and tell me if you are able to boot with mmx turned on.
    Remove the lines I specify below, this is in i386/i386/bcopy.s:

#define MMX_SAVE_BLOCK(missfunc)                \
        cmpl    $2048,%ecx ;                    \
        jb      missfunc ;                      \
        movl    MYCPU,%eax ;                    \
        btsl    $1,GD_FPU_LOCK(%eax) ;          \
        jc      missfunc ;                      \
        pushl   %ebx ;                          \
        movl    GD_CURTHREAD(%eax),%edx ;       \
        movl    TD_SAVEFPU(%edx),%ebx ;         \
        addl    $TDPRI_CRIT,TD_PRI(%edx) ;      \	<<<<<<< REMOVE
        cmpl    %edx,GD_NPXTHREAD(%eax) ;       \
        jne     100f ;                          \
        fxsave  0(%ebx) ;                       \
100: ;                                          \
        movl    %edx,GD_NPXTHREAD(%eax) ;       \
        leal    GD_SAVEFPU(%eax),%eax ;         \
        movl    %eax,TD_SAVEFPU(%edx) ;         \
        clts ;                                  \
        fninit ;                                \
        subl    $TDPRI_CRIT,TD_PRI(%edx) ;      \	<<<<<<< REMOVE
        pushl   $mmx_onfault


    Now this is important... if you CAN boot with mmx turned on with these
    lines removed.  You must *IMMEDIATELY* reboot with mmx turned off, 
    because what I am doing here is removing interrupt race protection which
    will cause the system to slowly corrupt itself when the race occurs.

    What I think may be happening is that an interrupt is occuring here and
    since I am exiting the critical section without checking for pending
    interrupts, and the clocks aren't operational yet, the system will 'miss'
    an interrupt and just go into a wait state waiting for the already-pending
    interrupt to occur.

    If you can verify that this is the problem I will put in a permanent
    fix, which is really only another three lines of assembly.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Bugs mailing list