Bug #715

panic at shutdown using comconsole

Added by thomas.nikolajsen over 7 years ago. Updated over 5 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Using HEAD I get this panic on shutdown,
when using serial console ('console="comconsole"' in /boot/loader.conf).

-thomas

dfly-1.9.0.20070627.panic.VOP_STDCLOSE (1.89 KB) thomas.nikolajsen, 06/28/2007 10:41 PM

dfly-2.3.0.20090212.panic.VOP_STDCLOSE (14.1 KB) thomas.nikolajsen, 02/16/2009 08:58 PM

History

#1 Updated by dillon over 7 years ago

:New submission from Thomas Nikolajsen <>:
:
:Using HEAD I get this panic on shutdown,
:when using serial console ('console=3D"comconsole"' in /boot/loader.conf).
:
: -thomas
:
:----------
:files: dfly-1.9.0.20070627.panic.VOP_STDCLOSE

There was a bug a month ago related to stdclose but it has long since
been fixed. Tripple check that you are actually running the latest
HEAD. Do a complete recompile of your kernel.

There are currently some issues with shutdown, but not related to
close(). They are mostly related to unmounting.

-Matt

#2 Updated by thomas.nikolajsen over 7 years ago

It is full build of HEAD from June 27th.
Updated source & rebuild today, but it didn't change panic.
I am going to make a virgin checkout & full build tomorrow; to triple check ;-)

No problem with reboot, halt, or shutdown -o from serial console.
No problem running shutdown -p from ssh login.
Panic is seen every time shutdown without -o is used on serial console.

-thomas

#3 Updated by dillon over 7 years ago

:Thomas Nikolajsen <> added the comment:
:
:It is full build of HEAD from June 27th.
:Updated source & rebuild today, but it didn't change panic.
:I am going to make a virgin checkout & full build tomorrow; to triple check ;-)
:
:No problem with reboot, halt, or shutdown -o from serial console.
:No problem running shutdown -p from ssh login.
:Panic is seen every time shutdown without -o is used on serial console.
:
: -thomas

I'm going to need a kernel core dump. If I do
'shutdown -h how "shutting down"' on a serial console on my test box it
shuts down ok.

I will note that there are definitely some shutdown issues in HEAD
at the moment. These issues are primarily related to assertions that
have been added to validate the shutdown code (meaning that the problems
have always existed, but now we actually detect them).

I really need core dumps to track them all down.

-Matt

#4 Updated by thomas.nikolajsen over 7 years ago

Hi Matt,

I do have core dump.

If I could have an account on leaf,
it would be easiest way to give you access.

Otherwise I could put it encrypted on my server
and give you password to decrypt.

-thomas

#5 Updated by thomas.nikolajsen over 7 years ago

Crash dump uploaded to leaf.

I did try to narrow down when I do/don't see crash:
- no crash from single user (boot -s)
- I tried disabling services:
reboot after rename /etc/rc.conf & no nfs-mounts: still crash.
- GENERIC do crash

I did check /usr/src: no problem.
(comparing to virgin checkout from primary cvs-src)

-thomas

#6 Updated by dillon over 7 years ago

:Crash dump uploaded to leaf.
:
:I did try to narrow down when I do/don't see crash:
: - no crash from single user (boot -s)
: - I tried disabling services:
: reboot after rename /etc/rc.conf & no nfs-mounts: still crash.
: - GENERIC do crash
:
:I did check /usr/src: no problem.
:(comparing to virgin checkout from primary cvs-src)
:
: -thomas

Please try this patch and tell me if it fixes the crash problem.
There is definitely a race in the code where blocking on the vnode
lock can lead to a duplicate operation.

-Matt

Index: tty_tty.c
===================================================================
RCS file: /cvs/src/sys/kern/tty_tty.c,v
retrieving revision 1.18
diff -u -p -r1.18 tty_tty.c
--- tty_tty.c 10 Sep 2006 01:26:39 -0000 1.18
+++ tty_tty.c 2 Jul 2007 22:32:29 -0000
@@ -84,17 +84,30 @@ struct vnode *ttyvp;
int error;

KKASSERT(p);
+again:
ttyvp = cttyvp(p);
if (ttyvp) {
if (ttyvp->v_flag & VCTTYISOPEN) {
error = 0;
} else {
- vsetflags(ttyvp, VCTTYISOPEN);
+ vhold(ttyvp);
vn_lock(ttyvp, LK_EXCLUSIVE | LK_RETRY);
+ /*
+ * We may have lost the tty while we were blocked
+ * on the lock.
+ */
+ if (cttyvp(p) != ttyvp ||
+ (ttyvp->v_flag & VCTTYISOPEN)) {
+ vn_unlock(ttyvp);
+ vdrop(ttyvp);
+ goto again;
+ }
+ vsetflags(ttyvp, VCTTYISOPEN);
error = VOP_OPEN(ttyvp, FREAD|FWRITE, ap->a_cred, NULL);
if (error)
vclrflags(ttyvp, VCTTYISOPEN);
vn_unlock(ttyvp);
+ vdrop(ttyvp);
}
} else {
error = ENXIO;
@@ -115,6 +128,7 @@ struct vnode *ttyvp;
int error;

KKASSERT(p);
+again:
ttyvp = cttyvp(p);
if (ttyvp == NULL) {
/*
@@ -124,12 +138,24 @@ */
/* error = EIO; */
error = 0;
} else if (ttyvp->v_flag & VCTTYISOPEN) {
- vclrflags(ttyvp, VCTTYISOPEN);
+ vhold(ttyvp);
error = vn_lock(ttyvp, LK_EXCLUSIVE | LK_RETRY);
if (error == 0) {
+ /*
+ * We may have lost the tty while we were blocked
+ * on the lock.
+ */
+ if (cttyvp(p) != ttyvp ||
+ (ttyvp->v_flag & VCTTYISOPEN) == 0) {
+ vn_unlock(ttyvp);
+ vdrop(ttyvp);
+ goto again;
+ }
+ vclrflags(ttyvp, VCTTYISOPEN);
error = VOP_CLOSE(ttyvp, FREAD|FWRITE);
vn_unlock(ttyvp);
}
+ vdrop(ttyvp);
} else {
error = 0;
}

#7 Updated by thomas.nikolajsen over 7 years ago

It does help some, but I still see panic ~90% of shutdown's.

-thomas

#8 Updated by dillon over 7 years ago

:Thomas Nikolajsen <> added the comment:
:
:It does help some, but I still see panic ~90% of shutdown's.
:
: -thomas

Upload another core from a crashed kernel with the patch. There
may be multiple issues here.

-Matt
Matthew Dillon
<>

#9 Updated by thomas.nikolajsen over 7 years ago

New core dump uploaded to
crash/VOP_STDCLOSE-BAD_OPENCOUNT/2

-thomas
PS: I will be off for two weeks.

#10 Updated by dillon over 7 years ago

:Thomas Nikolajsen <> added the comment:
:
:New core dump uploaded to
:crash/VOP_STDCLOSE-BAD_OPENCOUNT/2
:
: -thomas
:PS: I will be off for two weeks.

Ok. Hmm. email me your /etc/fstab and any manual commands you are
running to set up NFS and MFS mounts. You have some NFS and MFS mounts
which could be contributing to the bug, I need to try to set up something
similar to try to reproduce the problem.

-Matt
Matthew Dillon
<>

#11 Updated by thomas.nikolajsen over 7 years ago

NFS / MFS mounts aren't needed for panic;
I also see panic with UFS mounts only & empty /etc/rc.conf.

I haven't succeded reproducing panic on other (UP) host.
On (SMP (P4 HTT)) host where panic is seen,
I will install dfly on spare slice, to see if panic will reproduce.

-thomas
- fstab
/dev/ad4s1a / ufs rw 1 1
/dev/ad4s1b none swap sw 0 0
/dev/ad4s1d /var ufs rw 2 2
/dev/ad4s1e /usr ufs rw 2 2
/dev/acd0a /media/cd cd9660 ro,noauto

#12 Updated by thomas.nikolajsen over 5 years ago

New crash dump using HEAD of Feb 12, 2009.

panic at shutdown; only seen once in recent time. no comconsole this
time.
other host than earlier (don't use old host much in meantime),
now on other SMP (no APIC_IO) host, atom330, 2 cores, 2 HTT on each.

crash dump is rather big, 600MB, worth uploading to leaf?
(host has 2GB RAM, didn't limit RAM use by hw.physmem (yet))
(I have complete dump, with kernel and all modules in use).

-thomas

#13 Updated by dillon over 5 years ago

:Thomas Nikolajsen <> added the comment:
:
:New crash dump using HEAD of Feb 12, 2009.
:
:panic at shutdown; only seen once in recent time. no comconsole this=20
:time.
:other host than earlier (don't use old host much in meantime),
:now on other SMP (no APIC_IO) host, atom330, 2 cores, 2 HTT on each.
:
:crash dump is rather big, 600MB, worth uploading to leaf?
:(host has 2GB RAM, didn't limit RAM use by hw.physmem (yet))
:(I have complete dump, with kernel and all modules in use).
:
: -thomas

What is the panic message / backtrace ?

-Matt
Matthew Dillon
<>

#14 Updated by thomas.nikolajsen over 5 years ago

See below, it's also attached as file in the bug tool.

-thomas
-
panic: VOP_STDCLOSE: BAD OPENCOUNT 0xe992fd68 0 type=8 ops=0xc0388ba0 flgs=00005008

mp_lock = 00000002; cpuid = 2
Trace beginning at frame 0xe8d8b880
panic(e8d8b8a4,e992fd68,c40ae978,0,e8d8b8b8) at panic+0x14d
panic(c035a718,e992fd68,0,8,c0388ba0) at panic+0x14d
vop_stdclose(e8d8b8f8,e8d8b8f8,e8f0d218,e992fd68,e8d8b8e0) at vop_stdclose+0x26
spec_close(e8d8b8f8,e992fd68,e8d8b8ec,c02bc287,e8d8b8f8) at spec_close+0x12b
ufsspec_close(e8d8b8f8,e8d8b918,c01faf8e,e8d8b8f8,c03889f0) at ufsspec_close+0x24
ufs_vnoperatespec(e8d8b8f8,c03889f0,d3beb4f0,20002,e8d8b920) at ufs_vnoperatespec+0x11
vop_close(d3beb4f0,e992fd68,3,e992fd68,10) at vop_close+0x26
cttyclose(e8d8b94c,c037ac30,c40a9fc8,3,2000) at cttyclose+0xbf
dev_dclose(c40a9fc8,3,2000,e992eaa8,e8d8b9a8) at dev_dclose+0x29
spec_close(e8d8b9a8,0,e8d8b99c,c02bc287,e8d8b9a8) at spec_close+0xbd
ufsspec_close(e8d8b9a8,e8d8b9c8,c01faf8e,e8d8b9a8,c03889f0) at ufsspec_close+0x24
ufs_vnoperatespec(e8d8b9a8,c03889f0,d3beb4f0,e8d8b9cc,c01f9cc2) at ufs_vnoperatespec+0x11
vop_close(d3beb4f0,e992eaa8,3,e90c3d80,e90c3dc0) at vop_close+0x26
vn_close(e992eaa8,3,e90c3d80,e8d8ba24,c01904aa) at vn_close+0x2a
vn_closefile(e90c3d80,c030e6e3,e91d984c,c030e21c,802c7415) at vn_closefile+0x1e
fdrop(e90c3d80) at fdrop+0xfa
closef(e90c3d80,e8ded100) at closef+0x135
fdfree(e8f0d218,e8f0d310,e8f59ef0,802c7415,e8d8bc1c) at fdfree+0x2c7
exit1(1,e8d8bae0,c0388a20,d3beb4f0,c01fbec8) at exit1+0x19e
sigexit(e8f59ef0,1) at sigexit+0x616
postsig(1,e8f0d218,1,0,0) at postsig+0x111
userret(6,1500d,0,e8f0d218,c03780e4) at userret+0x129
syscall2(e8d8bd40) at syscall2+0x332
Xint0x80_syscall() at Xint0x80_syscall+0x36

#15 Updated by nthery over 5 years ago

It looks like s_ttyvp has been vgone_vxlocked() as it is set to VBAD
and has v_opencount==0.

I don't know if the bug comes from this call to vgone_vxlocked(). If
not, we could lazily check for VBAD in ctty*() and reset s_ttyvp if it
is BAD.

#16 Updated by nthery over 5 years ago

Um, there is a comment in spec_close() that says that s_ttyvp may
have v_opencount==0 (if the tty was revoked). But then vop_stdclose()
is called unconditionally at the end of spec_close(). It seems to me
that it should be called if and only if v_opencount > 0.

Thomas,

Could you check in the crash dump the value of p->p_session->s_ttyvp
in the innermost call to spec_close() please?

Cheers,
Nicolas

#17 Updated by thomas.nikolajsen over 5 years ago

(kgdb) bt
#0 dumpsys () at ./machine/thread.h:83
#1 0xc01a6599 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:375
#2 0xc01a685e in panic (fmt=0xc0337b38 "from debugger") at /usr/src/sys/kern/
kern_shutdown.c:800
#3 0xc014bcc9 in db_panic (addr=-1070598172, have_addr=0, count=-1,
modif=0xe8d8b730 "") at /usr/src/sys/ddb/db_command.c:447
#4 0xc014c334 in db_command_loop () at /usr/src/sys/ddb/db_command.c:343
#5 0xc014e93c in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_trap.c:71
#6 0xc02ff9ee in kdb_trap (type=3, code=0, regs=0xe8d8b82c) at /usr/src/sys/
platform/pc32/i386/db_interface.c:148
#7 0xc0310634 in trap (frame=0xe8d8b82c) at /usr/src/sys/platform/pc32/i386/
trap.c:815
#8 0xc03006e7 in calltrap () at /usr/src/sys/platform/pc32/i386/exception.s:785
#9 0xc02ff7e4 in Debugger (msg=0xc033e01d "panic") at ./cpu/cpufunc.h:73
#10 0xc01a6855 in panic (fmt=0xc035a718 "VOP_STDCLOSE: BAD OPENCOUNT %p %d
type=%d ops=%p flgs=%08x\n") at /usr/src/sys/kern/kern_shutdown.c:798
#11 0xc01e9d06 in vop_stdclose (ap=0xe8d8b8f8) at /usr/src/sys/kern/
vfs_default.c:1206
#12 0xc01fff73 in spec_close (ap=0xe8d8b8f8) at /usr/src/sys/vfs/specfs/
spec_vnops.c:763
#13 0xc02be245 in ufsspec_close (ap=0xe8d8b8f8) at /usr/src/sys/vfs/ufs/
ufs_vnops.c:1968
#14 0xc02bc287 in ufs_vnoperatespec (ap=0xe8d8b8f8) at /usr/src/sys/vfs/ufs/
ufs_vnops.c:2453
#15 0xc01faf8e in vop_close (ops=0xd3beb4f0, vp=0xe992fd68, fflag=3) at /usr/
src/sys/kern/vfs_vopops.c:258
#16 0xc01d3b23 in cttyclose (ap=0xe8d8b94c) at /usr/src/sys/kern/tty_tty.c:151
#17 0xc018e22e in dev_dclose (dev=0xc40a9fc8, fflag=3, devtype=8192) at /usr/
src/sys/kern/kern_device.c:124
#18 0xc01fff05 in spec_close (ap=0xe8d8b9a8) at /usr/src/sys/vfs/specfs/
spec_vnops.c:740
#19 0xc02be245 in ufsspec_close (ap=0xe8d8b9a8) at /usr/src/sys/vfs/ufs/
ufs_vnops.c:1968
#20 0xc02bc287 in ufs_vnoperatespec (ap=0xe8d8b9a8) at /usr/src/sys/vfs/ufs/
ufs_vnops.c:2453
#21 0xc01faf8e in vop_close (ops=0xd3beb4f0, vp=0xe992eaa8, fflag=3) at /usr/
src/sys/kern/vfs_vopops.c:258
#22 0xc01fa14c in vn_close (vp=0xe992eaa8, flags=3) at /usr/src/sys/kern/
vfs_vnops.c:415
#23 0xc01fa184 in vn_closefile (fp=0xe90c3d80) at /usr/src/sys/kern/
vfs_vnops.c:1109
#24 0xc01904aa in fdrop (fp=0xe90c3d80) at /usr/src/sys/sys/file2.h:120
#25 0xc01907a0 in closef (fp=0xe90c3d80, td=0xe8ded100) at /usr/src/sys/kern/
kern_descrip.c:2080
#26 0xc0190b98 in fdfree (p=0xe8f0d218) at /usr/src/sys/kern/kern_descrip.c:1734
#27 0xc019691a in exit1 (rv=1) at /usr/src/sys/kern/kern_exit.c:352
#28 0xc01a7081 in sigexit (lp=0xe8f59ef0, sig=1) at /usr/src/sys/kern/
kern_sig.c:1913
#29 0xc01a90d4 in postsig (sig=1) at /usr/src/sys/kern/kern_sig.c:1795
#30 0xc030fa0e in userret (lp=0xe8f59ef0, frame=<value optimized out>,
sticks=<value optimized out>) at /usr/src/sys/platform/pc32/i386/trap.c:289
#31 0xc0310c36 in syscall2 (frame=0xe8d8bd40) at /usr/src/sys/platform/pc32/
i386/trap.c:1411

(kgdb) frame 18
#18 0xc01fff05 in spec_close (ap=0xe8d8b9a8) at /usr/src/sys/vfs/specfs/
spec_vnops.c:740
740 error = dev_dclose(dev, ap->a_fflag, S_IFCHR);
(kgdb) print p
$1 = (struct proc *) 0xe8f0d218
(kgdb) print p->p_session
There is no member named p_session.
(kgdb) print p->p_pgrp
$2 = (struct pgrp *) 0xe9167ec0
(kgdb) print p->p_pgrp->pg_session
$3 = (struct session *) 0xd3ba1a60
(kgdb) print p->p_pgrp->pg_session->s_ttyvp
$4 = (struct vnode *) 0x0

-
$ grep p_session /usr/include/sys/proc.h
#define p_session p_pgrp->pg_session

-thomas

#18 Updated by nthery over 5 years ago

Sorry I forgot that p_session is a macro.

Cool. That's consistent with my hypothesis. I reckon it was
nullified when executing the body of the 2nd if statement in
spec_close(). I'll fix it tonight.

Thanks a lot.

Nicolas

#19 Updated by nthery over 5 years ago

I've committed a potential fix.

#20 Updated by thomas.nikolajsen over 5 years ago

Closing issue as I have only seen this panic rarely, and I guess
your commit 8e0ede57151dfffe3d785fc1f3c1e8bcf2c848dc fixes problem.

BTW: no problem for me looking up p_session macro.

-thomas

Also available in: Atom PDF