Project

General

Profile

Actions

Bug #2499

open

DRAGONFLY_3_2 lockd not responding correctly

Added by Nerzhul over 9 years ago. Updated over 9 years ago.

Status:
In Progress
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
Start date:
01/22/2013
Due date:
% Done:

0%

Estimated time:

Description

Hello,
i must use lockd for concurrent access on a webserver with nfs extended storage. There is some concurrent access and lockd isn't responding correctly.

On the NFSv3 client, timeout appears and console logs:
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd is alive again
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd is alive again
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd is alive again
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd is alive again
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd is alive again
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd not responding
nfs server A.B.C.65:/nfs/fbsd_pkg: lockd is alive again

After "netstat -an -f inet" i see there is a queue on rpc socket

netstat -an -f inet

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 0 0 A.B.C.65.nfsd WebCluster1.977 ESTABLISHED
tcp4 0 0 A.B.C.65.nfsd WebCluster1.611 ESTABLISHED
tcp4 0 0 localhost.smtp .* LISTEN
tcp4 0 0 *.ssh *.
LISTEN
tcp4 0 0 .1017 *. CLOSED
tcp4 0 0 .1020 *. LISTEN
tcp4 0 0 .nfsd *. LISTEN
tcp4 0 0 .1023 *. LISTEN
tcp4 0 0 .1022 *. LISTEN
tcp4 0 0 .sunrpc *. LISTEN
tcp4 0 0 A.B.C.65.nfsd A.B.C.96.811 ESTABLISHED
tcp4 0 0 A.B.C.65.nfsd WebCluster1.972 ESTABLISHED
tcp4 0 48 A.B.C.65.ssh 129.175.196.190.60067 ESTABLISHED
udp4 0 0 .918 *.
udp4 0 0 A.B.C.65.1028 ntp.u-psud.fr.ntp
udp4 456 0 .1017 *.
udp4 18656 0 .1018 *.
udp4 0 0 .nfsd *.
udp4 0 0 .1021 *.
udp4 0 0 .1020 *.
udp4 0 0 .1022 *.
udp4 0 0 .sunrpc *.

When i see that, i make tcpdump -nni em0 to see what's happening:

22:12:42.781597 IP 10.117.100.95.961 > 10.117.100.65.1017: UDP, length 212
22:12:48.801935 IP 10.117.100.95.961 > 10.117.100.65.1017: UDP, length 212
22:12:54.669917 IP 10.117.100.95.961 > 10.117.100.65.1017: UDP, length 212
22:13:00.148965 IP 10.117.100.95.961 > 10.117.100.65.1017: UDP, length 212

After a little time, lockd respond to all request, but many failed because of timeout

On the dragonflyBSD server i can see this in /var/log/messages

Jan 21 22:14:19 webfiler1 rpc.lockd: duplicate lock from WebCluster1.srv.
Jan 21 22:14:19 webfiler1 last message repeated 3 times
Jan 21 22:14:19 webfiler1 rpc.lockd: no matching entry for WebCluster1.srv.
Jan 21 22:14:29 webfiler1 dntpd571: issuing offset adjustment: 0.026637s
Jan 21 22:14:44 webfiler1 rpc.lockd: rpc to statd failed: RPC: Timed out
Jan 21 22:14:44 webfiler1 rpc.lockd: duplicate lock from WebCluster1.srv.
Jan 21 22:14:44 webfiler1 last message repeated 3 times
Jan 21 22:14:44 webfiler1 rpc.lockd: no matching entry for WebCluster1.srv.

I think there is a problem on DragonFlyBSD which queue many lockd requests.

Actions

Also available in: Atom PDF