nfs (client) directory cacheing bug
Andrew Atrens
atrens at nortel.com
Wed Dec 13 13:25:00 PST 2006
Hi Folks,
I'm running 1.6.x on my desktop box these days (recently upgraded from 1.4.1)
and am experiencing some weirdness around my clearcase view_server linux binaries
wrt nfs ..
I've tried various mount options v2, v3, soft, udp, tcp with little affect ..
What *did* make a noticeable improvement however was switching from an SMP to a
UP kernel...
So here's the behaviour, as best as I can describe it ..
Whenever I want to create a element (file) in clearcase, I run mkelem on an
existing file in my view.
mkelem consults a magic file to determine the type of the file and in turn
invokes a file type 'manager' that looks in a directory containing executable
methods for dealing with that type of file -
Here are the 'handlers' for the text-file-data manager -
$ ls -l /opt/rational/clearcase/lib/mgrs/text_file_delta/
total 34
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 annotate@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 compare@ -> ../../../bin/cleardiff
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 construct_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_branch@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 15:59 create_element@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 delete_branches_versions@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 get_cont_info@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 merge@ -> ../../../bin/cleardiff
-r-xr-xr-x 1 root bin 27770 Jun 1 2005 tfdmgr*
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xcompare@ -> ../../../bin/xcleardiff
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xmerge@ -> ../../../bin/xcleardiff
When I invoke my command, the type handler consults the vob database, which is nfs mounted, for
a temporary file that it never sees... even though the file exists.
Here's the invocation of mkelem -
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:02) --
$ cleartool mkelem -nc LIB.SUB
since this is more regularly failing, I stubbed in my own create_element handler to see what's going on -
$ ls -l /opt/rational/clearcase/lib/mgrs/text_file_delta/
total 34
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 annotate@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 compare@ -> ../../../bin/cleardiff
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 construct_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_branch@ -> tfdmgr
-rwxrwxr-x 1 root wheel 5032 Dec 13 15:59 create_element*
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 delete_branches_versions@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 get_cont_info@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 merge@ -> ../../../bin/cleardiff
-r-xr-xr-x 1 root bin 27770 Jun 1 2005 tfdmgr*
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xcompare@ -> ../../../bin/xcleardiff
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xmerge@ -> ../../../bin/xcleardiff
Here's the source code for the create_element stub -
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
int main(int argc, char **argv, char **envp) {
int x;
struct stat sb;
for (x = 0 ; x < argc; x++ )
puts(argv[x]);
while ( stat(argv[4], &sb) == -1 ) {
perror("stat failed");
sync();
sleep(1);
}
execve("/opt/rational/clearcase/lib/mgrs/text_file_delta/tfdmgr", argv, envp);
}
Now, when I invoke it I see this -
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:16) --
$ cleartool mkelem -nc LIB.SUB
/opt/rational/clearcase/lib/mgrs/text_file_delta/create_element
45806de2
5241b78a.8af011db.8c99.ac:e6:df:90:6a:9f
5241b78e.8af011db.8c99.ac:e6:df:90:6a:9f
5241b792.8af011db.8c99.ac:e6:df:90:6a:9f
/net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
^Z
[1]+ Stopped cleartool mkelem -nc LIB.SUB
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ ls /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
/net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ ls -l /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
-rw-rw-rw- 1 vobroot opt_ne 0 Dec 13 16:17 /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ cp /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1 /tmp
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ cat /tmp/tmp_12902.1
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:18) --
$ cat /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
-- atrens at atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:18) --
$ fg
cleartool mkelem -nc LIB.SUB
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
^C
Interrupt
Interesting, eh? The file it's looking for is now there but the process can't see it! It really seems like a
race condition, but I don't understand why the process never recovers after I've resumed it...
Heh, as interesting as this problem is it's really starting to get annoying. :) I wonder if there's any relation
to the infamous gmake bug ?
Andrew.
More information about the Bugs
mailing list