Bug #1177

All new processes stuck in "flstik" state

Added by rumcic about 6 years ago. Updated over 4 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:dillon% Done:

0%

Category:-
Target version:-

Description

Most of the time when I see processes in flstik state, the machine recovers,
but have managed to find how to reproduce the situation, where all new
processes get stuck in flstik state (even ssh-ing into the machine creates a
new sshd process which gets stuck in flstik state and won't continue, can't
login locally, etc.).

The machine has /boot UFS and / hammer with several PFS' (done using
the /usr/share/examples/rconfig/hammer.sh, but my /var/tmp isn't a PFS, but a
softlink to /tmp).

If I run http://pastebin.ca/1279548 (with http://pastebin.ca/1279549 as the
pkgsrc.img.label) 2, sometimes 3 times, all new processes get stuck in flstik
state during installworld.
Panicked the machine and uploaded the cores (the first time it happened i
panicked the machine and made the memory dumps and after succesfully
reproducing the situation i did it again) to
leaf:~rumko/crash/{kernel,vmcore}.{0,1}
--
Regards,
Rumko

History

#1 Updated by dillon about 6 years ago

:Most of the time when I see processes in flstik state, the machine recovers,
:but have managed to find how to reproduce the situation, where all new
:processes get stuck in flstik state (even ssh-ing into the machine creates a
:new sshd process which gets stuck in flstik state and won't continue, can't
:login locally, etc.).
:
:The machine has /boot UFS and / hammer with several PFS' (done using
:the /usr/share/examples/rconfig/hammer.sh, but my /var/tmp isn't a PFS, but a
:softlink to /tmp).
:
:If I run http://pastebin.ca/1279548 (with http://pastebin.ca/1279549 as the
:pkgsrc.img.label) 2, sometimes 3 times, all new processes get stuck in flstik
:state during installworld.
:Panicked the machine and uploaded the cores (the first time it happened i
:panicked the machine and made the memory dumps and after succesfully
:reproducing the situation i did it again) to
:leaf:~rumko/crash/{kernel,vmcore}.{0,1}
:--
:Regards,
:Rumko

I'm guessing that the problem is due to running HAMMER on the VN
backed by a file on another HAMMER filesystem.

The kgdb on leaf is unable to list the threads in your dump, probably
due to structural mismatches. Could you do it on your box and post a
backtrace of the threads stuck in flstik? I am going to guess that
the bd_wait() they are stuck in is deep inside VN, probably the
path: VNDEVICE->VOP_WRITE->(HAMMER)->bwillwrite()->bd_wait().

You 'kgdb kernel.1 vmcore.1' and do an 'info thread'. The threads
we are interested in are for threads 0xd8fa2000, 0xd8fa2600,
0xd8fb5300, 0xdce45400, 0xd8fb4800, and 0xdce46d00. For each one
find the thread number and do 'thread <number>' and 'back' (abbreviation
for backtrace).

If that is the problem the solution is really simple, I can just
pass a flag in the VOP_WRITE to tell HAMMER not to call bwillwrite().

-Matt
Matthew Dillon
<>

#2 Updated by rumcic about 6 years ago

Matthew Dillon wrote:

0xd8fa2000:
#0  0xc01cb977 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:725
#1  0xc01d123a in tsleep (ident=0xc0456234, flags=0, wmesg=0xc03cd749 "flstik",
timo=100) at /usr/src/sys/kern/kern_synch.c:489
#2  0xc01fe282 in bd_wait (totalspace=16384) at /usr/src/sys/kern/vfs_bio.c:409
#3  0xc01c8be8 in bwillwrite (bytes=65536)
at /usr/src/sys/kern/kern_iosched.c:66
#4  0xc0311e2d in hammer_vop_write (ap=0xdaa2b858)
at /usr/src/sys/vfs/hammer/hammer_vnops.c:393
#5  0xc0218ffe in vop_write (ops=0xd2744e90, vp=0xdd29c068, uio=0xdaa2b8f0,
ioflag=256, cred=0xdd04f1e8) at /usr/src/sys/kern/vfs_vopops.c:351
#6  0xc0196af7 in vnstrategy (ap=0xdaa2b930)
at /usr/src/sys/dev/disk/vn/vn.c:396
#7  0xc01ad0c2 in dev_dstrategy_chain (dev=0xdcef9c58, bio=0xc3ced39c)
at /usr/src/sys/kern/kern_device.c:250
#8  0xc021c8eb in spec_strategy (ap=0xdaa2b97c)
at /usr/src/sys/vfs/specfs/spec_vnops.c:506
#9  0xc021c6b9 in spec_vnoperate (ap=0xdaa2b97c)
at /usr/src/sys/vfs/specfs/spec_vnops.c:136
#10 0xc0218fa2 in vop_strategy (ops=0xd2744fb0, vp=0xdd28f868, bio=0xc3ced39c)
at /usr/src/sys/kern/vfs_vopops.c:659
#11 0xc01fd492 in vn_strategy (vp=0x0, bio=0x0)
at /usr/src/sys/kern/vfs_bio.c:3082
#12 0xc030184f in hammer_io_direct_write (hmp=0xdd569000, record=0xc3a17c80,
bio=0xc3ced32c) at /usr/src/sys/vfs/hammer/hammer_io.c:1211
#13 0xc0310642 in hammer_vop_strategy (ap=0xdaa2bb34)
at /usr/src/sys/vfs/hammer/hammer_vnops.c:2663
#14 0xc0218fa2 in vop_strategy (ops=0xd274b190, vp=0xdf14ee68, bio=0xc3ced32c)
at /usr/src/sys/kern/vfs_vopops.c:659
#15 0xc01fd492 in vn_strategy (vp=0x0, bio=0x0)
at /usr/src/sys/kern/vfs_bio.c:3082
#16 0xc0200ef4 in bwrite (bp=0xc3ced2fc) at /usr/src/sys/kern/vfs_bio.c:790
#17 0xc0201c1d in bawrite (bp=0xc3ced2fc) at /usr/src/sys/kern/vfs_bio.c:964
#18 0xc020f1fa in vfsync_bp (bp=0xc3ced2fc, data=0xdaa2bc00)
at /usr/src/sys/kern/vfs_subr.c:828
#19 0xc020c38a in buf_rb_tree_RB_SCAN (head=0xdf14eecc, scancmp=0xc020d3ce
<vfsync_data_only_cmp>, callback=0xc020f030 <vfsync_bp>, data=0xdaa2bc00)
    at /usr/src/sys/kern/vfs_subr.c:139
#20 0xc020f33c in vfsync (vp=0xdf14ee68, waitfor=2, passes=1, checkdef=0,
waitoutput=0) at /usr/src/sys/kern/vfs_subr.c:678
#21 0xc03115dd in hammer_vop_fsync (ap=0xdaa2bc58)
at /usr/src/sys/vfs/hammer/hammer_vnops.c:199
#22 0xc02182fe in vop_fsync (ops=0xd274b190, vp=0xdf14ee68, waitfor=2)
at /usr/src/sys/kern/vfs_vopops.c:449
#23 0xc03082a2 in hammer_sync_scan2 (mp=0xd8f651d8, vp=0x0, data=0xdaa2bcf0)
at /usr/src/sys/vfs/hammer/hammer_ondisk.c:1518
#24 0xc0210898 in vmntvnodescan (mp=0xd8f651d8, flags=17, fastfunc=0xc030814f
<hammer_sync_scan1>, slowfunc=0xc030826e <hammer_sync_scan2>,
    data=0xdaa2bcf0) at /usr/src/sys/kern/vfs_mount.c:1005
#25 0xc030820c in hammer_sync_hmp (hmp=0xdd569000, waitfor=4)
at /usr/src/sys/vfs/hammer/hammer_ondisk.c:1474
#26 0xc030e909 in hammer_vfs_sync (mp=0xd8f651d8, waitfor=4)
at /usr/src/sys/vfs/hammer/hammer_vfsops.c:907
#27 0xc02116dc in sync_fsync (ap=0xdaa2bd40)
at /usr/src/sys/kern/vfs_sync.c:410
#28 0xc02182fe in vop_fsync (ops=0xc041fa20, vp=0xddb42168, waitfor=4)
at /usr/src/sys/kern/vfs_vopops.c:449
#29 0xc02118b8 in sched_sync () at /usr/src/sys/kern/vfs_sync.c:214
#30 0xc01b750b in suspend_kproc (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/kern_kthread.c:158

0xd8fa2600:
#0  0xc01cb977 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:725
#1  0xc01d120c in tsleep (ident=0xc0456234, flags=0, wmesg=0xc03cd749 "flstik",
timo=100) at /usr/src/sys/kern/kern_synch.c:478
#2  0xc01fe282 in bd_wait (totalspace=16384) at /usr/src/sys/kern/vfs_bio.c:409
#3  0xc01c8be8 in bwillwrite (bytes=16384)
at /usr/src/sys/kern/kern_iosched.c:66
#4  0xc0311e2d in hammer_vop_write (ap=0xdce6ab5c)
at /usr/src/sys/vfs/hammer/hammer_vnops.c:393
#5  0xc0218ffe in vop_write (ops=0xd2744e90, vp=0xd2589968, uio=0xdce6ac98,
ioflag=8323075, cred=0xc39a58c8) at /usr/src/sys/kern/vfs_vopops.c:351
#6  0xc021801e in vn_write (fp=0xd61e7c80, uio=0xdce6ac98, cred=0xc39a58c8,
flags=0) at /usr/src/sys/kern/vfs_vnops.c:715
#7  0xc01e2dd8 in kern_pwritev (fd=9, auio=0xdce6ac98, flags=0, res=0xdce6acf0)
at /usr/src/sys/sys/file2.h:72
#8  0xc01e3414 in sys_writev (uap=0xdce6acf0)
at /usr/src/sys/kern/sys_generic.c:389
#9  0xc0393bce in syscall2 (frame=0xdce6ad40)
at /usr/src/sys/platform/pc32/i386/trap.c:1386
#10 0xc037dab6 in Xint0x80_syscall ()
at /usr/src/sys/platform/pc32/i386/exception.s:876
#11 0x280c5654 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

0xd8fb5300:
#0  0xc01cb977 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:725
#1  0xc01d120c in tsleep (ident=0xc0456234, flags=0, wmesg=0xc03cd749 "flstik",
timo=100) at /usr/src/sys/kern/kern_synch.c:478
#2  0xc01fe282 in bd_wait (totalspace=16384) at /usr/src/sys/kern/vfs_bio.c:409
#3  0xc01c8bc8 in bwillinode (n=1) at /usr/src/sys/kern/kern_iosched.c:87
#4  0xc021744a in vn_open (nd=0xdd4c6c80, fp=0xd61e8d60, fmode=514, cmode=420)
at /usr/src/sys/kern/vfs_vnops.c:159
#5  0xc021486d in kern_open (nd=0xdd4c6c80, oflags=513, mode=420,
res=0xdd4c6cf0) at /usr/src/sys/kern/vfs_syscalls.c:1724
#6  0xc0214a42 in sys_open (uap=0xdd4c6cf0)
at /usr/src/sys/kern/vfs_syscalls.c:1834
#7  0xc0393bce in syscall2 (frame=0xdd4c6d40)
at /usr/src/sys/platform/pc32/i386/trap.c:1386
#8  0xc037dab6 in Xint0x80_syscall ()
at /usr/src/sys/platform/pc32/i386/exception.s:876
#9  0x280e3f54 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

0xdce45400:
#0  0xc01cb977 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:725
#1  0xc01d123a in tsleep (ident=0xdd569194, flags=0, wmesg=0xc03d56c5 "hmrfl1",
timo=0) at /usr/src/sys/kern/kern_synch.c:489
#2  0xc0301fc0 in hammer_io_wait_all (hmp=0xdd569000,
ident=0xc03d56c5 "hmrfl1") at /usr/src/sys/vfs/hammer/hammer_io.c:160
#3  0xc02fbf22 in hammer_flusher_finalize (trans=0xdd569114, final=1)
at /usr/src/sys/vfs/hammer/hammer_flusher.c:641
#4  0xc02fc7e6 in hammer_flusher_master_thread (arg=0xdd569000)
at /usr/src/sys/vfs/hammer/hammer_flusher.c:360
#5  0xc01cb142 in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:228
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

0xd8fb4800:
#0  0xc01cb977 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:725
#1  0xc01d120c in tsleep (ident=0xc0456234, flags=0, wmesg=0xc03cd749 "flstik",
timo=100) at /usr/src/sys/kern/kern_synch.c:478
#2  0xc01fe282 in bd_wait (totalspace=16384) at /usr/src/sys/kern/vfs_bio.c:409
#3  0xc01c8bc8 in bwillinode (n=1) at /usr/src/sys/kern/kern_iosched.c:87
#4  0xc021744a in vn_open (nd=0xdd41cc80, fp=0xd61e8b68, fmode=514, cmode=420)
at /usr/src/sys/kern/vfs_vnops.c:159
#5  0xc021486d in kern_open (nd=0xdd41cc80, oflags=513, mode=420,
res=0xdd41ccf0) at /usr/src/sys/kern/vfs_syscalls.c:1724
#6  0xc0214a42 in sys_open (uap=0xdd41ccf0)
at /usr/src/sys/kern/vfs_syscalls.c:1834
#7  0xc0393bce in syscall2 (frame=0xdd41cd40)
at /usr/src/sys/platform/pc32/i386/trap.c:1386
#8  0xc037dab6 in Xint0x80_syscall ()
at /usr/src/sys/platform/pc32/i386/exception.s:876
#9  0x280c1f54 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

0xdce46d00:
#0  0xc01cb977 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:725
#1  0xc01d120c in tsleep (ident=0xc0456234, flags=0, wmesg=0xc03cd749 "flstik",
timo=100) at /usr/src/sys/kern/kern_synch.c:478
#2  0xc01fe282 in bd_wait (totalspace=16384) at /usr/src/sys/kern/vfs_bio.c:409
#3  0xc01c8bc8 in bwillinode (n=1) at /usr/src/sys/kern/kern_iosched.c:87
#4  0xc021744a in vn_open (nd=0xddc6dc80, fp=0xdccc40e0, fmode=1538, cmode=420)
at /usr/src/sys/kern/vfs_vnops.c:159
#5  0xc021486d in kern_open (nd=0xddc6dc80, oflags=1537, mode=438,
res=0xddc6dcf0) at /usr/src/sys/kern/vfs_syscalls.c:1724
#6  0xc0214a42 in sys_open (uap=0xddc6dcf0)
at /usr/src/sys/kern/vfs_syscalls.c:1834
#7  0xc0393bce in syscall2 (frame=0xddc6dd40)
at /usr/src/sys/platform/pc32/i386/trap.c:1386
#8  0xc037dab6 in Xint0x80_syscall ()
at /usr/src/sys/platform/pc32/i386/exception.s:876
#9  0x28144f54 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
--
Regards,
Rumko

#3 Updated by corecode almost 6 years ago

Matt,

did this get fixed?

#4 Updated by rumcic over 4 years ago

Have not seen this for quite some time, so set as resolved.

Also available in: Atom PDF