Bug #273

Sendmail rset command hangs socket on 1.6-Release

Added by sven over 7 years ago. Updated over 7 years ago.

Status:ClosedStart date:
Priority:UrgentDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

We have a cluster of mail servers, one of which we have upgraded to
1.6-Release (from 1.2) and then subsequently did a fresh install of 1.6
on after the issue first arose.

In essence, using the supplied sendmail and having the connected
sendmail session issue a rset command results in that sendmail process
hanging (only by issuing the telnet ^] command to get to a telnet prompt
and issuing quit can the session be killed). If instead of an rset
command the quit command is issued, the session will end (i.e. the
telnet session terminates) but the sendmail process with "quit" appears
in the process list and will not go away unless killed.

Has anyone seen this odd behavior? At first I thought it was due to a
milter, so I removed it from the .mc file and rebuilt sendmail.cf but no
change in socket behavior. How can I help in terms of troubleshooting?

Sven

History

#1 Updated by dillon over 7 years ago

:We have a cluster of mail servers, one of which we have upgraded to
:1.6-Release (from 1.2) and then subsequently did a fresh install of 1.6
:on after the issue first arose.
:
:In essence, using the supplied sendmail and having the connected
:sendmail session issue a rset command results in that sendmail process
:hanging (only by issuing the telnet ^] command to get to a telnet prompt
:and issuing quit can the session be killed). If instead of an rset
:command the quit command is issued, the session will end (i.e. the
:telnet session terminates) but the sendmail process with "quit" appears
:in the process list and will not go away unless killed.
:
:Has anyone seen this odd behavior? At first I thought it was due to a
:milter, so I removed it from the .mc file and rebuilt sendmail.cf but no
:change in socket behavior. How can I help in terms of troubleshooting?
:
:Sven

That is really odd. I wonder if it is related to the posix issues
that were reported recently.

If you can repeat it easily with telnet, could you use tcpdump
and ktrace (on the forked sendmail demon after you connect) to
figure out what happens when you try to send it the 'rset' ?
With tcpdump it should be possible to determine if the 'rset' gets
transmitted, and with ktrace it should be possible to determine
if sendmail sees it.

-Matt
Matthew Dillon
<>

#2 Updated by sven over 7 years ago

On Tue, 2006-08-01 at 18:24 -0700, Matthew Dillon wrote:
> :We have a cluster of mail servers, one of which we have upgraded to
> :1.6-Release (from 1.2) and then subsequently did a fresh install of 1.6
> :on after the issue first arose.
> :
> :In essence, using the supplied sendmail and having the connected
> :sendmail session issue a rset command results in that sendmail process
> :hanging (only by issuing the telnet ^] command to get to a telnet prompt
> :and issuing quit can the session be killed). If instead of an rset
> :command the quit command is issued, the session will end (i.e. the
> :telnet session terminates) but the sendmail process with "quit" appears
> :in the process list and will not go away unless killed.
> :
> :Has anyone seen this odd behavior? At first I thought it was due to a
> :milter, so I removed it from the .mc file and rebuilt sendmail.cf but no
> :change in socket behavior. How can I help in terms of troubleshooting?
> :
> :Sven
>
> That is really odd. I wonder if it is related to the posix issues
> that were reported recently.
>
> If you can repeat it easily with telnet, could you use tcpdump
> and ktrace (on the forked sendmail demon after you connect) to
> figure out what happens when you try to send it the 'rset' ?
> With tcpdump it should be possible to determine if the 'rset' gets
> transmitted, and with ktrace it should be possible to determine
> if sendmail sees it.
>
> -Matt
> Matthew Dillon
> <>

Based on the tcpdump it would appear that the command does get sent:
Transmission Control Protocol, Src Port: 38093 (38093), Dst Port: smtp
(25), Seq: 67, Ack: 357, Len: 6
Source port: 38093 (38093)
Destination port: smtp (25)
Sequence number: 67 (relative sequence number)
Next sequence number: 73 (relative sequence number)
Acknowledgement number: 357 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 6912 (scaled)
Checksum: 0xfd64 [correct]
Options: (12 bytes)
Simple Mail Transfer Protocol
Command: rset\r\n
Command: rset

Based on the ktrace it would appear that sendmail sees it, but fails to
write to the sendmail.st file:

50367 sendmail RET sendto 163/0xa3
50367 sendmail CALL gettimeofday(0xbfbfba38,0)
50367 sendmail RET gettimeofday 0
50367 sendmail CALL lstat(0xbfbfb570,0xbfbfb9c0)
50367 sendmail NAMI "/var/log/sendmail.st"
50367 sendmail RET lstat 0
50367 sendmail CALL lstat(0xbfbeb020,0xbfbeafb0)
50367 sendmail NAMI "/"
50367 sendmail RET lstat 0
50367 sendmail CALL geteuid
50367 sendmail RET geteuid 0
50367 sendmail CALL lstat(0xbfbeb020,0xbfbeafb0)
50367 sendmail NAMI "/var"
50367 sendmail RET lstat 0
50367 sendmail CALL geteuid
50367 sendmail RET geteuid 0
50367 sendmail CALL lstat(0xbfbeb020,0xbfbeafb0)
50367 sendmail NAMI "/var/log"
50367 sendmail RET lstat 0
50367 sendmail CALL geteuid
50367 sendmail RET geteuid 0
50367 sendmail CALL open(0x283778a0,0x2,0x180)
50367 sendmail NAMI "/var/log/sendmail.st"
50367 sendmail RET open 7
50367 sendmail CALL fstat(0x7,0xbfbfb910)
50367 sendmail RET fstat 0
50367 sendmail CALL flock(0x7,0x2)

Compare this to a "normal" situation (taken from a freebsd box):

52523 sendmail RET sendto 160/0xa0
52523 sendmail CALL gettimeofday(0x7fffffffa3d0,0)
52523 sendmail RET gettimeofday 0
52523 sendmail CALL lstat(0x7fffffff9df0,0x7fffffffa340)
52523 sendmail NAMI "/var/log/sendmail.st"
52523 sendmail RET lstat 0
52523 sendmail CALL lstat(0x7fffffff1d70,0x7fffffff1cf0)
52523 sendmail NAMI "/"
52523 sendmail RET lstat 0
52523 sendmail CALL geteuid
52523 sendmail RET geteuid 0
52523 sendmail CALL lstat(0x7fffffff1d70,0x7fffffff1cf0)
52523 sendmail NAMI "/var"
52523 sendmail RET lstat 0
52523 sendmail CALL geteuid
52523 sendmail RET geteuid 0
52523 sendmail CALL lstat(0x7fffffff1d70,0x7fffffff1cf0)
52523 sendmail NAMI "/var/log"
52523 sendmail RET lstat 0
52523 sendmail CALL geteuid
52523 sendmail RET geteuid 0
52523 sendmail CALL open(0x5eb5c0,0x2,0x180)
52523 sendmail NAMI "/var/log/sendmail.st"
52523 sendmail RET open 6
52523 sendmail CALL fstat(0x6,0x7fffffffa270)
52523 sendmail RET fstat 0
52523 sendmail CALL flock(0x6,0x2)
52523 sendmail RET flock 0
52523 sendmail CALL fstat(0x6,0x7fffffffa2a0)
52523 sendmail RET fstat 0
52523 sendmail CALL read(0x6,0x7fffffffa3f0,0x5a8)
52523 sendmail GIO fd 6 read 1448 bytes
0x0000 deb1 0100 0400 0000 06e8 ce44 0000 0000 a805 0000 0000
0000 3100 0000 0000 0000 |...........D............1.......|
0x0020 6a00 0000 0000 0000 0300 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 |j...............................|
0x0040 0000 0000 0000 0000 0e00 0000 0000 0000 0000 0000 0000
0000 2300 0000 0000 0000 |........................#.......|

< --chop -->

52523 sendmail RET read 1448/0x5a8
52523 sendmail CALL lseek(0x6,0,0,0)
52523 sendmail RET lseek 0
52523 sendmail CALL write(0x6,0x7fffffffa3f0,0x5a8)
52523 sendmail GIO fd 6 wrote 1448 bytes
0x0000 deb1 0100 0400 0000 06e8 ce44 0000 0000 a805 0000 0000
0000 3200 0000 0000 0000 |...........D............2.......|

and so on ...

Sven

#3 Updated by dillon over 7 years ago

:Based on the ktrace it would appear that sendmail sees it, but fails to
:write to the sendmail.st file:
:
: 50367 sendmail RET sendto 163/0xa3
:...
: 50367 sendmail RET fstat 0
: 50367 sendmail CALL flock(0x7,0x2)

This is great news! It's stuck on the flock() call, which is the
same problem that was reported with postfix.

Now do this. With sendmail stuck on the flock call, compile up
/usr/src/test/debug/vnodeinfo.c and run vnodeinfo -a out to a file.
Find the file the flock is stuck on and lets see if there is anyone
else holding the lock.

NOTE! You will have to use the vnodeinfo.c from HEAD. The one in
REL does not have the lock tracing code in it. And you may have
to make some modifications if there are #include file snafu's.

I feel confident that we can track down this problem and fix it. What
I need to know is whether that flock is being held on the vnode's blocked
list without any locks held on the locked list, which would indicate a
tsleep/wakeup race, or whether there is someone else holding the lock.

-Matt

#4 Updated by justin over 7 years ago

Appears to be fixed in the 1.6.1 release.

Also available in: Atom PDF