postfix causes hangs

Rob Schmuloff rrschoolie at yahoo.com
Sat Jun 26 20:03:47 PDT 2004


Hello Matt,

  First apologies for the poor formatting of this
email...

   I have some interesting information regarding the
postfix problem.  It seems to affect a few of the
postfix daemons: smtpd, local, and cleanup. The
debugging output seems to indicate a race condition
between two processes calling flock and requesting an
exclusive lock on the same file.  When the machine
wedges the network stack still seems to be working (
i.e. TCP connect, and pings), but the console is
locked up and nothing seems to be running in userland.
 I can reproduce this probelm by receiving email from
the freebsd-current mailing list ( not sure why other
than the high volume of mail, and their server is
running postfix too).

The debugging output from the lockf code (hand
copied):

 pid 5831 (cleanup) lf_destroy_range:
-2401050962867404578..-2401050962867404578
pid 5831 (cleanup) lf_create_range:
0..9223372036854775807           
pid 5832 (cleanup) lf_destroy_range: 
0..9223372036854775807
pid 5832 (cleanup) lockf 0xd7c22bd4                   
    
        0..9223372036854775807 type exclusive owned by
-1
blocked locks 9223372036854775807 type exclusive
waiting on 0xc0861ec0

pid 5832 (cleanup) lf_destroy_range:
-2401050962867404578..-2401050962867404578
pid 5832 (cleanup) lf_create_range:
0..9223372036854775807
pid 5831 (cleanup) lf_destroy_range: 
0..9223372036854775807
pid 5831 (cleanup) lockf 0xd7c22bd4
        0..9223372036854775807 type exclusive owned by
-1
blocked locks 9223372036854775807 type exclusive
waiting on 0xc0861ec0


I wrote a quick prog that seems to replicate the
problem if run with stdout/stderr redirected to
/dev/null ( or just comment out the printf's)

/*---------------------------------------------------*/
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/file.h>
#include <sys/types.h>

int main()
{

   int fd,status;
   int ops[4]={ LOCK_SH, LOCK_EX, LOCK_SH|LOCK_NB,
LOCK_EX|LOCK_NB};
   char *opstr[4] ={"LOCK_SH", "LOCK_EX",
"LOCK_EX|LOCK_NB", "LOCK_SH|LOCK_NB"};
   pid_t pid, i;
   fork();
   fork();

   pid=getpid();
   /* i = pid % 2; */
   i = 1;
   printf("pid %d i = %d\n",pid,i);
   for (;;){        
        fd = open("testfile",O_RDWR|O_CREAT);
        while ( status=flock(fd, ops[i]) )
             usleep( rand()/20000);
       
        printf("PID %d got %s lock --
%d\n",pid,opstr[i],status);
        usleep(rand()/20000);
        status=flock(fd, LOCK_UN);
        printf("PID %d released lock --
%d\n",pid,status);
        usleep(rand()/20000);
        close (fd);

   }
}




--- Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx>
wrote:
>     Rob, I have committed a change to
> kern/kern_lockf.c that puts a very
>     short wait in the lockf retry loop in an attempt
> to prevent the system
>     from locking up when it gets into this livelock.
> 
>     I don't think this will actually fix the problem
> (or at least I hope
>     it doesn't).  I am instead hoping that it will
> be possible to ktrace
>     the processes involved and/or otherwise track
> the problem down when
>     it occurs without the whole machine going down.
> 
> 						-Matt
> 
> 
> :Hello,
> :
> :  I've experienced periodic hangs on my system for
> :sice mid-May.  Lately, my terminal is unresponsive
> as
> :soon as I start postfix and receive incoming mail. 
> :When I drop into DDB,  the stack trace shows:
> :
> :scgetc()
> :sckbdevent()
> :atkbd_intr()
> :atkbd_isa_intr()
> :intr_mux()
> :ithread_handler()
> :(Perhaps this is  ctl-alt-esc trace)
> :
> :This is with a kernel/world built June 23rd.  I
> :*think* the problem is somewhere in the lockf code
> :because I glanced at 'ps' from DDB. Also, I'm using
> :procmail for local delivery, so that's also a
> :possibility.  Sorry for the sparse information. 
> I'm
> :trying to get more data for you..
> :
> :Thanks,
> :
> :Rob
> 


	
		
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail Attachment:
tt.c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bin00000.bin
Type: application/octet-stream
Size: 763 bytes
Desc: "Description: tt.c"
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20040626/87a4fdc5/attachment-0015.bin>


More information about the Kernel mailing list