Replicating your tests

Wed Jan 26 11:56:36 PST 2005

On Wed, 2005-01-26 at 21:43 +0100, Devon H. O'Dell wrote:
> On Wed, 2005-01-26 at 20:17 +0100, Felix von Leitner wrote: 
[snip]
> > > Yeah. It's quite strange: http://www.rafb.net/paste/results/Pghall74.html
> > 
> > Mhh, that trace is not helpful at all and looks like stack corruption.
> > Maybe you could do a ktrace and look what the last syscall was?
> > 
> > Felix
> 
> Strangely, when I ktrace, it doesn't seem to die. It times out; going
> through the kdump gives me:
> 
> 59095 gatling  CALL  accept(0x4,0xbfbfd68c,0xbfbfd688)
> 59095 gatling  RET   accept -1 errno 35 Resource temporarily unavailable

This ktrace happened after it had already segfaulted, so it was running
in its own ``quirks mode'' if you will (as I discussed below in my
previous email). I just rebooted at caught it core with ktrace.

> It then times out and closes several (hundred) file descriptors and
> loops around doing:
> 
> 59095 gatling  CALL  gettimeofday(0xbfbfd6a0,0)
> 59095 gatling  RET   gettimeofday 0
> 59095 gatling  CALL  gettimeofday(0xbfbfd660,0)
> 59095 gatling  RET   gettimeofday 0
> 59095 gatling  CALL  gettimeofday(0xbfbfd640,0)
> 59095 gatling  RET   gettimeofday 0
> 59095 gatling  CALL  kevent(0x7,0,0,0xbfbfce78,0x64,0xbfbfce70)
> 59095 gatling  RET   kevent 0

This seems to be what it does normally when it waits.

> This seems to be what it does when idling, even before accepting
> connections (after it has segfaulted?). After a ktrace, I can't reliably
> get it to segfault, or even accept/service connections. I was lucky to
> get it to segfault a second time afterwards. If I kill it in GDB and
> look at the stack, it looks about the same as the one I placed on rafb:
> 
> #0  0x280afdc4 in kevent () from /usr/lib/libc.so.4
> #1  0x08054dfd in ?? ()
> #2  0x0000000a in ?? ()
> #3  0x00000000 in ?? ()
> #4  0x00000000 in ?? ()
> #5  0xbfbfce48 in ?? ()
> bla bla
> #2308 0x08056494 in __divdi3 ()
> #2309 0x00000001 in ?? ()
> 
> I say ``(after it has segfaulted?)'' above because I do not know that
> this behavior occurs before the segfault happens, since the behavior of
> gatling changes after that, regarding how many connections it accepts /
> files it serves / what it wants to do. Since I can't reliably get it to
> segfault, I'm having trouble actually being able to ktrace it in its
> ``virgin'' state. I'll reboot here in a second and see what happens.

So, after a reboot:

   652 gatling  CALL  write(0x1,0x805bca0,0x1e)
   652 gatling  GIO   fd 1 wrote 30 bytes
       "accept 597 127.0.0.1 1910 590
       "
   652 gatling  RET   write 30/0x1e
   652 gatling  CALL  fcntl(0x255,0x3,0)
   652 gatling  RET   fcntl 6
   652 gatling  CALL  fcntl(0x255,0x4,0x6)
   652 gatling  RET   fcntl 0
   652 gatling  PSIG  SIGSEGV SIG_DFL
   652 gatling  NAMI  "gatling.core"

> Hi, list, I've attached you. Perhaps other people (Matt, Jeff, Simon,
> Joerg, Jeroen) might have ideas on what potential issues are.
> 
> I wonder if this would act differently if I ran with a non-SMP kernel?
> The machine is a dual processor P3.
> 
> Kind regards,
> 
> Devon H. O'Dell

--Devon