can't gdb vkernel
Nicolas Thery
nthery at gmail.com
Fri Jul 11 15:37:11 PDT 2008
2008/7/11 Simon 'corecode' Schubert <corecode at fs.ei.tum.de>:
> Nicolas Thery wrote:
>>
>> I'm looking into this. There is a deadlock involving the gdb lwp and
>> 2 vkernel lwps. I hope to have a clearer understanding and a fix this
>> week-end.
>
> Great! When you're saying you are looking into this, I don't have a doubt
> that you will find the cause :)
Thanks for your trust ;-)
There is indeed a deadlock:
- The initial vkernel thread is sleeping on the user mutex associated with the
vkd cothread.
- The vkd cothread sends a SIGIO (lwp_kill(2)) to the initial thread to
simulate an interrupt. The initial thread's sleep is interrupted and it is
made runnable.
- When the cothread is about to return to userland from lwp_kill(2), it is
preempted in userexit() and the initial thread runs.
- The initial thread handles the signal (issignal() called from tsleep()). As
the process is being debugged, proc_stop() is called, the process moves to
SSTOP and the initial thread is stopped (tstop()).
- The cothread is then awakened and goes back to userland (that's a bug, it
should stop too).
- The cothread eventually waits on its condition variable.
- Meanwhile GDB blocks on wait(2) forever because only one lwp out of two is
stopped (p_nstopped < p_nthreads).
The kernel tests if the lwp should be stopped in userret():
if (p->p_stat == SSTOP) {
get_mplock();
tstop();
rel_mplock();
goto recheck;
}
However, userret() is called *before* userexit() and the cothread is not
stopped.
To confirm this hypothesis, I added the above code (with the if turned into a
while) in userexit() after the preemption points and the vkernel booted fine.
I observed another hang during shutdown though. I'm not sure this is a correct
fix. I'll study this into more details tomorrow.
Another mystery remains: what change caused this regression? Some cvs annotate
on various files didn't point to the culprit.
More information about the Kernel
mailing list