panic: assertion: _ifac->ifa_magic == IFA_CONTAINER_MAGIC in _IFAFREE

Mon Mar 17 17:54:40 PDT 2008

On Mon, Mar 17, 2008 at 09:59:30PM +0800, Sepherosa Ziehau wrote:
> On Mon, Mar 17, 2008 at 11:06 AM, YONETANI Tomokazu
> <qhwt+dfly at les.ath.cx> wrote:
> > On Sun, Mar 16, 2008 at 08:09:17PM +0800, Sepherosa Ziehau wrote:
> >  > On Sun, Mar 16, 2008 at 6:07 PM, YONETANI Tomokazu <qhwt+dfly at les.ath.cx> wrote:
> >  > > Hello.
> >  > >  Just caught a panic while playing with NFS mounted git tree
> >  > >  (but I cannot reliably reproduce it after that):
> >  >
> >  > dst address of a UDP packet is changed, which changes port/addr hash
> >  > too, but old route entry was not allocated on the current CPU.  Since
> >  > the box only contains 2 CPUs, after the l{port,addr}/f{port,addr}
> >  > hash, the problem probably will not show itself ;).  Please run
> >  > following test program several times and then unload the NIC module to
> >  > see whether you could reproduce the problem (if you have TCP
> >  > connections too, you will have to wait 2MSL):
> >  > http://leaf.dragonflybsd.org/~sephe/test_udp.c
> >
> >  I've been trying to reproduce it but so far unsuccessful...
> 
> I have changed this test program a little bit.  Run it in following way:
> ./test_udp remote_ip
> 
> If it paused, then on the other term:
> ifconfig iface_local down
> And kill test_udp, if you don't have TCP connection, the panic should
> happen immediately.

Yes!  But this time at a different place:
#9  0xc0328dc2 in rtrequest1 (req=11, rtinfo=0xca77ec9c, ret_nrt=0xca77ecf0)
    at /home/dfly/current/sys/net/if_var.h:445
#10 0xc032915f in rtrequest (req=11, dst=0xca77ed14, gateway=0x0, netmask=0x0,
    flags=0, ret_nrt=0xca77ecf0) at /home/dfly/current/sys/net/route.c:637
#11 0xc0329386 in _rtlookup (dst=0xca77ed14, generate_report=1, ignore=0)
    at /home/dfly/current/sys/net/route.c:275
#12 0xc0343642 in arplookup (addr=30085292, create=1, proxy=-1067129044)
    at /home/dfly/current/sys/net/route.h:364
#13 0xc034371a in arp_update_oncpu (m=<value optimized out>, saddr=30085292,
    create=28790, dologging=0) at /home/dfly/current/sys/netinet/if_ether.c:563
#14 0xc0343f66 in arp_update_msghandler (netmsg=0xc985bd08)
    at /home/dfly/current/sys/netinet/if_ether.c:877
#15 0xc03284a2 in rtable_service_loop (dummy=0x0)
    at /home/dfly/current/sys/net/route.c:178
#16 0xc02c12e5 in lwkt_deschedule_self (td=Cannot access memory at address 0x8
)
    at /home/dfly/current/sys/kern/lwkt_thread.c:214

So, can I start testing your patch, or do you need an update to it?

Thanks.

> Though the panic place is different, but the root cause should be same:
> Initially UDP socket is neither connected nor bound, thus f{addr,port}
> and l{addr,port} in pcb are 0.  Sending the first datagram on this
> socket will be dispatched to CPU0.  Then lport is chosen in
> udp_output() and route entry is allocated on CPU0.  If (lport >> 8 &
> 1) is 1 on a 2 CPU box, then rest of socket operation will happen on
> CPU1, e.g. sending to different faddr will cause route entry allocated
> on CPU0 be freed on CPU1.
> 
> I think we need to fix following entry points:
> 1) udp_output(): after ip_output() if old_lport==0 and
> mycpuid!=udp_addrcpu(inpcb addr/port pair) then free inpcb's route
> entry
> 2) udp_connect(): this function always happens on CPU0, so if
> udp_addrcpu(inpcb addr/port pair)!=mycpuid(0 here), then we need to
> free inpcb's route entry
> 3) udp_disconnect(): at the end of it, if mycpuid!=udp_addrcpu(inpcb
> addr/port pair) then free inpcb's route entry
> 
> I may miss some entry points here, so please point them out to me, if
> some pop up in your mind.
> 
> Best Regards,
> sephe