can't gdb vkernel
Nicolas Thery
nthery at gmail.com
Sun Jul 13 03:37:57 PDT 2008
2008/7/12 Nicolas Thery <nthery at gmail.com>:
> There is indeed a deadlock:
>
> - The initial vkernel thread is sleeping on the user mutex associated with the
> vkd cothread.
>
> - The vkd cothread sends a SIGIO (lwp_kill(2)) to the initial thread to
> simulate an interrupt. The initial thread's sleep is interrupted and it is
> made runnable.
>
> - When the cothread is about to return to userland from lwp_kill(2), it is
> preempted in userexit() and the initial thread runs.
>
> - The initial thread handles the signal (issignal() called from tsleep()). As
> the process is being debugged, proc_stop() is called, the process moves to
> SSTOP and the initial thread is stopped (tstop()).
>
> - The cothread is then awakened and goes back to userland (that's a bug, it
> should stop too).
>
> - The cothread eventually waits on its condition variable.
>
> - Meanwhile GDB blocks on wait(2) forever because only one lwp out of two is
> stopped (p_nstopped < p_nthreads).
>
> The kernel tests if the lwp should be stopped in userret():
>
> if (p->p_stat == SSTOP) {
> get_mplock();
> tstop();
> rel_mplock();
> goto recheck;
> }
>
> However, userret() is called *before* userexit() and the cothread is not
> stopped.
>
> To confirm this hypothesis, I added the above code (with the if turned into a
> while) in userexit() after the preemption points and the vkernel booted fine.
> I observed another hang during shutdown though. I'm not sure this is a correct
> fix. I'll study this into more details tomorrow.
>
> Another mystery remains: what change caused this regression? Some cvs annotate
> on various files didn't point to the culprit.
>
I committed a fix.
It adds yet another location where the kernel tstop() lwps. Some
factoring may be possible
but that will have to wait post 2.0.
More information about the Kernel
mailing list