[alc at FreeBSD.org: cvs commit: src/sys/vm vm_page.c vm_pageout.c]
Matthew Dillon
dillon at apollo.backplane.com
Sat Feb 14 21:21:53 PST 2004
:This is correct. Our shootdown code has a race. The old Mach pmaps
:have a correct implementation and as you conclude it can't be done
:entirely asynchronously. (I imagine that Darwin has it right as
:well.) For an overview, take a look at the algorithm labeled "CMU
:Mach" on http://www.cs.rochester.edu/u/www/courses/456/spring99/lecture/lecture9.html. (The other algorithms are for NUMA architectures and don't
:apply to us.)
:
:I expect that I'll address this when I complete the pmap locking.
:Nobody else appears to be in a hurry to fix this. :-)
:
:Regards,
:Alan
I successfully implemented a generic cpu rendezvous API in DFly and
started in on the pmap code. It does not look like it is going to be
easy at all.
It kinda looks like we need to have a pte tracking API for the pmap
code. Something like this maybe? What do you think?
#define MAXPIR 16
struct pmap_inval_info {
vm_offset_t pir_ary[MAXPIR]; /* individual vma's */
int pir_npages; /* MAXPIR + 1 if invltlb */
struct lwkt_cpusync pir_cpusync; /* DFly cpusync API */
};
pmap_inval_init(&info);
Setup info structure for use. Low cost. No IPIs are issued.
pmap_inval_add(&info, pmap, va);
Add a (pmap,va) to the info structure and force foreign cpus in
pm_active to enter into a rendezvous spin loop by sending appropriate
IPIs to them. This call does not return until the cpus have entered
the spin loop, but is also optimizes (is a nop) for those cpus that
are already in a spin loop from prior calls.
The contents of pte's in the pmap can be safely read and, if the
pmap lock (or MP lock in DFly's case) is held, modified after
making this call.
Invalidation of the va is deferred until pmap_inval_flush() is
called.
pmap_inval_flush(&info);
This cleans up. Any spinning remote cpus are allowed to issue
the appropriate invalidations that have been built up in the
info structure and exit the rendezvous. Appropriate invalidations
are issued on the current cpu as well.
Terminate any remote cpu's spin loops and cause them to invalidate
all VA's registered with pmap_inval_add(). If too many VA's were
registered or a va of (vm_offset_t)-1 was registered, the
remote cpus will simply call cpu_invltlb().
The current cpu also issues similar cpu_invl1pg() or invltlb() calls.
You must call pmap_inval_flush() prior to any thread block, switch,
or yield to ensure that all target cpus have left the spin loop.
That API seems to fit fairly well into the pmap code though my first
attempt to do it in DFly still resulted in somewhat of a mess.
I have also refined the cpu synchronization API that I recently added
to DragonFly, which you can get from the DragonFly source base
(see www.dragonflybsd.org). kern/lwkt_ipiq.c contains the meat of
the MI code for both the IPI messaging code and the cpusync API.
I have measured the synchronous IPI overhead on a DELL-2550
(2xCPU P3 @1.1GHz), call to completion, at around 5 uS. Some of that
is probably syscall overhead since I dummied up a syscall to test the API,
so call it 4-5 uS.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Bugs
mailing list