contig memory allocator fix (vm_contig.c)

Andrew Atrens atrens at nortelnetworks.com
Sat Feb 19 06:42:31 PST 2005


Hi All,

Here's the problem. The iwi driver wants to allocate a ~160k chunk of memory
in which to place firmware which it then DMA's onto wireless nic. This
normally works but every once in a while gets stuck in vm_contig_pg_alloc() 
and never leaves. If you look at top it looks to be in 'swwrt' in the 
swap_pager, but it's actually stuck a couple of levels up in a goto again1: 
loop, in vm_contig_pg_alloc(), forever.

I can trigger the problem by doing a 'make clean && make depend && make' on 
the kernel. Following this command if I ifconfig up/down the interface,
triggering a firmware reload, ifconfig looks like it's stuck in a 'swwrt
and there's lots of disk activity. After adding trace code I noted that
the SAME bp is getting VOP_STRATEGY'd (line 1427 or swap_pager.c), and the
ata driver is writing it and issuing a wakeup() via biodone() so that 
swp_pager_sync_iodone() is getting called and kicking us out of 
tsleep("swwrt") correctly. But this is happening ad infinitum.


So I've isolated the problem to a few lines in vm_contig.c.


               if ((i == vmstats.v_page_count) ||
                        ((VM_PAGE_TO_PHYS(&pga[i]) + size) > high)) {

again1:
                        if (vm_contig_pg_clean(PQ_INACTIVE))
                                goto again1;
                        if (vm_contig_pg_clean(PQ_ACTIVE))
                                goto again1;

                        crit_exit();
                        continue;       /* next pass */
                }


The four things that I've tried to add are -

1. In the main for loop in vm_contig_pg_alloc() there are two exit paths -
one  inside a critical section, and the other outside a critical section. On
the failure path, if the second pass allocation attempt fails, crit_exit()
gets called twice and the kernel panics because td_pri has gone negative. So
I've put a quick conditional check to catch this. This was definitely a bug.

2. I've added another crit_exit()/crit_enter() pair to give interrupts
another chance to run. Not sure how helpful this is.

3. The cleaning loop is now bounded - I've used a for loop instead of the
goto again1: thing. So we can't get caught here forever. Theoretically
this could result in a 'contig_malloc index < 0' failure, but I've not
been able to reproduce that - so that's encouraging. An on failure cases
now the thread won't wedge, like the behaviour that I've been seeing.
( Incidentally I think this has also been hitting the nvidia module - as
the thread will wedge regardless of the wait options given to malloc.
On restart whenever savecore runs (saving a core) I find that X11
startup will often appear to freeze and the disk activity light will go
solid. )

4. Finally, after the cleanup we do a 'continue'. At this point 'start'
should be reset to zero, otherwise on the next iteration we could be
starting too high up in the page list to ever succeed, that could happen
if (start + size) > high. This also looks like a bug.

Andrew.




--- vm_contig.c 2004-11-10 15:19:51.000000000 -0500
+++ vm_contig.c.aa      2005-02-19 09:08:50.000000000 -0500
@@ -202,6 +202,8 @@
        vm_page_t pga = vm_page_array;
        vm_page_t m;
        int pqtype;
+       int in_crit = 0;
+       int clean_iters;
 
        size = round_page(size);
        if (size == 0)
@@ -212,8 +214,10 @@
                panic("vm_contig_pg_alloc: boundary must be a power of 2");
 
        start = 0;
+
        for (pass = 0; pass <= 1; pass++) {
                crit_enter();
+               in_crit = 1; /* mark our state as _in_ a critical section */
 again:
                /*
                 * Find first page in array that is free, within range, aligned, and
@@ -243,14 +247,35 @@
                 */
                if ((i == vmstats.v_page_count) ||
                        ((VM_PAGE_TO_PHYS(&pga[i]) + size) > high)) {
-
-again1:
-                       if (vm_contig_pg_clean(PQ_INACTIVE))
-                               goto again1;
-                       if (vm_contig_pg_clean(PQ_ACTIVE))
-                               goto again1;
-
+                       /*
+                        * Best effort clean up - bound on v_page_count.
+                        */
+                       for (clean_iters = 0 ;
+                               clean_iters < vmstats.v_page_count;
+                               clean_iters++) {
+                               crit_exit(); /* give interrupts a chance */
+                               crit_enter();
+                               if (!vm_contig_pg_clean(PQ_INACTIVE) &&
+                                   !vm_contig_pg_clean(PQ_ACTIVE))
+                                       break;
+                       }
+                       /*
+                        * When we hit this code on the final pass we'll
+                        * kick out of the loop outside of the critical
+                        * section. Use the in_crit flag to mark this
+                        * state to avoid calling crit_exit() a second
+                        * time and panicking the kernel.
+                        */
                        crit_exit();
+                       in_crit = 0;
+                       /*
+                        * We got here because we couldn't find a 
+                        * contiguous chunk of 'size' below address
+                        * 'high'. Reset the start marker to give us
+                        * the best chance of succeeding on the next
+                        * iteration.
+                        */
+                       start = 0;
                        continue;       /* next pass */
                }
                start = i;
@@ -305,9 +330,10 @@
        }
 
        /*
-        * Failed.
+        * Failed. Check and if necessary release the critical section.
         */
-       crit_exit();
+       if (in_crit)
+               crit_exit();
        return (-1);
 }
 





More information about the Submit mailing list