Bug #409

nfs (client) directory cacheing bug

Added by andrew_atrens about 8 years ago. Updated about 8 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi Folks,

I'm running 1.6.x on my desktop box these days (recently upgraded from 1.4.1)
and am experiencing some weirdness around my clearcase view_server linux binaries
wrt nfs ..

I've tried various mount options v2, v3, soft, udp, tcp with little affect ..

What *did* make a noticeable improvement however was switching from an SMP to a
UP kernel...

So here's the behaviour, as best as I can describe it ..

Whenever I want to create a element (file) in clearcase, I run mkelem on an
existing file in my view.

mkelem consults a magic file to determine the type of the file and in turn
invokes a file type 'manager' that looks in a directory containing executable
methods for dealing with that type of file -

Here are the 'handlers' for the text-file-data manager -

$ ls -l /opt/rational/clearcase/lib/mgrs/text_file_delta/
total 34
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 annotate@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 compare@ -> ../../../bin/cleardiff
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 construct_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_branch@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 15:59 create_element@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 delete_branches_versions@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 get_cont_info@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 merge@ -> ../../../bin/cleardiff
-r-xr-xr-x 1 root bin 27770 Jun 1 2005 tfdmgr*
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xcompare@ -> ../../../bin/xcleardiff
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xmerge@ -> ../../../bin/xcleardiff

When I invoke my command, the type handler consults the vob database, which is nfs mounted, for
a temporary file that it never sees... even though the file exists.

Here's the invocation of mkelem -

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:02) --
$ cleartool mkelem -nc LIB.SUB

since this is more regularly failing, I stubbed in my own create_element handler to see what's going on -

$ ls -l /opt/rational/clearcase/lib/mgrs/text_file_delta/
total 34
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 annotate@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 compare@ -> ../../../bin/cleardiff
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 construct_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_branch@ -> tfdmgr
-rwxrwxr-x 1 root wheel 5032 Dec 13 15:59 create_element*
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 create_version@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 delete_branches_versions@ -> tfdmgr
lrwxrwxrwx 1 root bin 6 Dec 13 00:34 get_cont_info@ -> tfdmgr
lrwxrwxrwx 1 root bin 22 Dec 13 00:35 merge@ -> ../../../bin/cleardiff
-r-xr-xr-x 1 root bin 27770 Jun 1 2005 tfdmgr*
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xcompare@ -> ../../../bin/xcleardiff
lrwxrwxrwx 1 root bin 23 Dec 13 00:35 xmerge@ -> ../../../bin/xcleardiff

Here's the source code for the create_element stub -

#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>

int main(int argc, char **argv, char **envp) {
int x;
struct stat sb;
for (x = 0 ; x < argc; x++ )
puts(argv[x]);

while ( stat(argv[4], &sb) == -1 ) {
perror("stat failed");
sync();
sleep(1);
}
execve("/opt/rational/clearcase/lib/mgrs/text_file_delta/tfdmgr", argv, envp);
}

Now, when I invoke it I see this -

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:16) --
$ cleartool mkelem -nc LIB.SUB
/opt/rational/clearcase/lib/mgrs/text_file_delta/create_element
45806de2
5241b78a.8af011db.8c99.ac:e6:df:90:6a:9f
5241b78e.8af011db.8c99.ac:e6:df:90:6a:9f
5241b792.8af011db.8c99.ac:e6:df:90:6a:9f
/net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
^Z
[1]+ Stopped cleartool mkelem -nc LIB.SUB

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ ls /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
/net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ ls -l /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1
-rw-rw-rw- 1 vobroot opt_ne 0 Dec 13 16:17 /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ cp /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1 /tmp

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:17) --
$ cat /tmp/tmp_12902.1

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:18) --
$ cat /net/zcars0xx/export/vobstore/disk2/OM5K/bcs.vbs/s/sdft/1f/15/tmp_12902.1

-- atrens@atrens: /localdisk/viewstore/atrens_VxWorks-5.5.2/vobs/bcs/Tornado-2.2.x/docs/bspkit (16:18) --
$ fg
cleartool mkelem -nc LIB.SUB
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
stat failed: No such file or directory
^C
Interrupt

Interesting, eh? The file it's looking for is now there but the process can't see it! It really seems like a
race condition, but I don't understand why the process never recovers after I've resumed it...

Heh, as interesting as this problem is it's really starting to get annoying. :) I wonder if there's any relation
to the infamous gmake bug ?

Andrew.

History

#1 Updated by andrew_atrens about 8 years ago

Oops.. little bug ... 4 should be 5 ... hmmm ... a little closer to figuring this out
I suppose. :)

#2 Updated by dillon about 8 years ago

If this is running over NFS you should be able to use tcpdump to trace
out the NFS operations. Use a UDP mount to make the tracing work more
easily.

tcpdump -i <interface> -s 4096 -v port 2049

And see if you can localize the operations related to the problem.

-Matt

Also available in: Atom PDF