cvs commit: src/lib/libthread_xu Makefile pthread.map src/lib/libthread_xu/arch Makefile.inc src/lib/libthread_xu/arch/alpha Makefile.inc src/lib/libthread_xu/arch/alpha/alpha pthread_md.c src/lib/libthread_xu/arch/alpha/include pthread_md.h src/lib/libthread_xu/arch/amd64 ...

Tue Feb 1 13:45:35 PST 2005

On Tue, Feb 01, 2005 at 10:41:24AM -0800, Matthew Dillon wrote:
> :We have to touch the page tables for a process switch anyway, it's not
> :that expensive to have a thread-local page mapping there I think.
> :On the other hand the LDT approach costs for every access.
> :
> :Joerg
> 
>     The rfork'd processes are sharing the same page table, and the switch
>     code detects this and does not bother to reload %cr3 (which saves a lot
>     of cpu cycles), so you can't create a thread-local page mapping that way.

This is an optimisation for the intra-program switch if you want to call it so.
I would argue that the normal switch case is either user -> kernel or to
a different kernel thread.

>     As far as I know the LDT is the only way to create uniqueness between
>     different processes sharing the same page table.  It's expensive, but 
>     probably not as expensive as reloading %cr3.
> 
>     It would be interesting to test that hypothesis... what is more expensive?
>     reloading %cr3 on every switch or reloading the LDT on every switch ?

Which of both solution is faster depends on the type of program you use.
If you have a lot of thread-local storage, but mostly CPU bound programs,
the page table approach should be faster. If you have a mostly IO bound
program with a lot of context switches between threads, the LDT approach
is better.

Can we measure the context switch overhead between two processes sharing
the page table and two processes not sharing the page table?

Joerg