Bug #1554

nfs stall

Added by thomas.nikolajsen over 4 years ago. Updated about 1 year ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

dfly 2.4.0

'git gc' stalled (host didn't freeze) after some
'nfs server .. not responding' / '.. is alive again'
git repo on nfs mounted dir on local host.
Other commands using nfs mount also stalls.

Same experience with nfs mounted on remote host.
(have seen this a few times over the last ~2 months,
was guessing problem was network HW (although ping did respond))

(did `shutdown' shortly before escape to debugger;
it didn't seem to shutdown: returned to shell prompt;
can do other core dump if needed)

'git gc' did succeed using local fs (hammer) directly (no nfs).
Can reproduce, as prev. state of git repo is in snapshot.

Core dump *.39 uploading to leaf.

History

#1 Updated by dillon over 4 years ago

:New submission from Thomas Nikolajsen <>:
:
:dfly 2.4.0
:
:'git gc' stalled (host didn't freeze) after some
:'nfs server .. not responding' / '.. is alive again'
:git repo on nfs mounted dir on local host.
:Other commands using nfs mount also stalls.
:
:Same experience with nfs mounted on remote host.
:(have seen this a few times over the last ~2 months,
:was guessing problem was network HW (although ping did respond))
:
:(did `shutdown' shortly before escape to debugger;
:it didn't seem to shutdown: returned to shell prompt;
:can do other core dump if needed)
:
:'git gc' did succeed using local fs (hammer) directly (no nfs).
:Can reproduce, as prev. state of git repo is in snapshot.
:
:Core dump *.39 uploading to leaf.

You have a ton of NFS mounts here. Hmm. The NFS client is stuck
waiting for a response from the NFS server (on the same host). The
NFS server (the nfsd's) are stuck in a vnode lock on the HAMMER
filesystem waiting for the buffer cache.

This looks like another HAMMER buffer cache exhaustion deadlock,
again probably due to the 128M of ram in the machine. However,
it looks like a different issue then the one from your other
bug report.

I dug into why HAMMER was stalling in the core and it looked like
it shouldn't be stalling. HAMMER was only reserving one buffer.
The bufdaemon and bufdaemon_hw are both in wdrn1 which implies
they were flushing data to disk.

It could be that the issue here is not an actual deadlock but simply
a great deal of disk write activity causing long stalls in the
system. Did you notice a significant amount of hard drive activity
while the system was in this state? The only thing you are running
is the 'git gc'. It could be that write activity from the git gc
is creating long stalls and causing NFS to report the problem.

If that is the case the issue is probably more one of HAMMER simply
being massively inefficient due to the tiny buffer cache, but otherwise
operating.

-Matt
Matthew Dillon
<>

#2 Updated by dillon over 4 years ago

It's worth retesting w/ the latest stuff in the master branch.
I fixed a couple of NFS issues related to systems with small
amounts of memory.

-Matt

#3 Updated by thomas.nikolajsen over 4 years ago

Stall was no disk or (noticable) CPU activity for several minutes.

There is ~20 NFS mounts; for exports of HAMMER PFSs.
(NFS is used for local access as maproot is used)

I will retest w/ NFS fixes merged to 2.4.

#4 Updated by thomas.nikolajsen over 4 years ago

Same result with
2.4.1 with recent NFS fixes (b58aa6d0, b9a7a2bd) merged.

'git gc' seems to progress somewhat longer, but eventually stalls.

Core dump *.43 uploading to leaf.

As described below, I have experienced stalls on NFS mounts for some time
on client with more memory (physmem=512M); especially under heavy traffic.
Typically 'make [quick]world' w/ /usr{src,obj} NFS mounted;
cross building amd64 seems to hit stall more often.

'git gc' seems to stall every time, even on local server;
i.e. without network HW to blame.

#5 Updated by tuxillo about 1 year ago

  • Status changed from New to Closed
  • Assignee deleted (0)

Hi,

I doubt this applies anymore.

antonioh@devel01:/home/source/dfbsd> git gc
Counting objects: 409066, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (96871/96871), done.
Writing objects: 100% (409066/409066), done.
Total 409066 (delta 309637), reused 405329 (delta 306363)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 409066, done.
antonioh@devel01:/home/source/dfbsd> df -h .
Filesystem Size Used Avail Capacity Mounted on
nas:/home 1.8T 692G 1.1T 37% /home

Cheers,
Antonio Huete

Also available in: Atom PDF