[alc at FreeBSD.org: cvs commit: src/sys/vm vm_page.c vm_pageout.c]

Sat Feb 14 21:21:53 PST 2004

:This is correct.  Our shootdown code has a race.  The old Mach pmaps
:have a correct implementation and as you conclude it can't be done
:entirely asynchronously.  (I imagine that Darwin has it right as
:well.)  For an overview, take a look at the algorithm labeled "CMU
:Mach" on http://www.cs.rochester.edu/u/www/courses/456/spring99/lecture/lecture9.html.  (The other algorithms are for NUMA architectures and don't
:apply to us.)
:
:I expect that I'll address this when I complete the pmap locking.
:Nobody else appears to be in a hurry to fix this.  :-)
:
:Regards,
:Alan

    I successfully implemented a generic cpu rendezvous API in DFly and
    started in on the pmap code.  It does not look like it is going to be
    easy at all.

    It kinda looks like we need to have a pte tracking API for the pmap
    code.  Something like this maybe?   What do you think?

    #define MAXPIR	16

    struct pmap_inval_info {
	vm_offset_t	pir_ary[MAXPIR];	/* individual vma's */
	int		pir_npages;		/* MAXPIR + 1 if invltlb */
	struct lwkt_cpusync pir_cpusync;	/* DFly cpusync API */
    };

    pmap_inval_init(&info);

	Setup info structure for use.  Low cost.  No IPIs are issued.

    pmap_inval_add(&info, pmap, va);

	Add a (pmap,va) to the info structure and force foreign cpus in
	pm_active to enter into a rendezvous spin loop by sending appropriate
	IPIs to them.  This call does not return until the cpus have entered
	the spin loop, but is also optimizes (is a nop) for those cpus that
	are already in a spin loop from prior calls.

	The contents of pte's in the pmap can be safely read and, if the
	pmap lock (or MP lock in DFly's case) is held, modified after
	making this call.

	Invalidation of the va is deferred until pmap_inval_flush() is
	called.

    pmap_inval_flush(&info);

	This cleans up.  Any spinning remote cpus are allowed to issue 
	the appropriate invalidations that have been built up in the 
	info structure and exit the rendezvous.  Appropriate invalidations
	are issued on the current cpu as well.

	Terminate any remote cpu's spin loops and cause them to invalidate
	all VA's registered with pmap_inval_add().  If too many VA's were
	registered or a va of (vm_offset_t)-1 was registered, the
	remote cpus will simply call cpu_invltlb().

	The current cpu also issues similar cpu_invl1pg() or invltlb() calls.

	You must call pmap_inval_flush() prior to any thread block, switch,
	or yield to ensure that all target cpus have left the spin loop.

    That API seems to fit fairly well into the pmap code though my first
    attempt to do it in DFly still resulted in somewhat of a mess.

    I have also refined the cpu synchronization API that I recently added
    to DragonFly, which you can get from the DragonFly source base
    (see www.dragonflybsd.org).  kern/lwkt_ipiq.c contains the meat of
    the MI code for both the IPI messaging code and the cpusync API.

    I have measured the synchronous IPI overhead on a DELL-2550 
    (2xCPU P3 @1.1GHz), call to completion, at around 5 uS.  Some of that
    is probably syscall overhead since I dummied up a syscall to test the API,
    so call it 4-5 uS.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>