Bug #2092

Panic: Bad link elm 0x... next->prev != elm

Added by masterblaster over 3 years ago. Updated almost 3 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:dillon% Done:

0%

Category:-
Target version:-

Description

Hi all,

i'm running v2.11.0.397.g4b06a0-DEVELOPMENT x86_64

I got this kind of panic on a daily basis, always at night (probably
while hammer does reblocking?):

Jun 20 03:09:14 randy kernel: ahci0.1: disk_rw: unknown state 6
Jun 20 03:09:14 randy last message repeated 18 times
Jun 20 03:09:14 randy kernel: panic: Bad link elm 0xffffffe01e019e78
next->prev != elm
Jun 20 03:09:14 randy kernel: cpuid = 0
Jun 20 03:09:14 randy kernel: Trace beginning at frame 0xffffffe01de1fa00
Jun 20 03:09:14 randy kernel: panic() at panic+0x1ed
Jun 20 03:09:14 randy kernel: panic() at panic+0x1ed
Jun 20 03:09:14 randy kernel: camisr_runqueue() at camisr_runqueue+0x87
Jun 20 03:09:14 randy kernel: swi_cambio() at swi_cambio+0x169
Jun 20 03:09:14 randy kernel: ithread_handler() at ithread_handler+0x1ba
Jun 20 03:09:14 randy kernel: boot() called on cpu#0

After applying these two patches it's running stable since 4 days.

The difference is now that TAILQ_EMPTY (patch 0002) is in
camisr_runqueue() *and* is protected by a spinlock too. The last maybe
fixes the panic? Does it makes sense?

ByE!

0001-Optimize-isr-routine-from-FreeBSD-proved-to-work.patch Magnifier (1.32 KB) masterblaster, 06/23/2011 06:19 PM

0002-TAILQ_EMPTY-now-done-in-camisr_runqueue-so-useless-h.patch Magnifier (1.04 KB) masterblaster, 06/23/2011 06:19 PM

0003-Fix-race-condition-on-cam-simqueue.patch Magnifier (1.2 KB) masterblaster, 07/04/2011 10:08 AM

History

#1 Updated by masterblaster over 3 years ago

To revise my comment:

TAILQ_EMPTY was already in camisr_runqueue(), so
0001-Optimize-isr-routine-from-FreeBSD-proved-to-work.patch has really nothing
to do with locking issues, just a drop in optimization from FreeBSD

Nevertheless 0002-TAILQ_EMPTY-now-done-in-camisr_runqueue-so-useless-h.patch
still makes sense to me:
in camisr_runqueue() 'while ((ccb_h = TAILQ_FIRST(&sim->sim_doneq)) != NULL)'
lets TAILQ_FIRST != NULL make the same job as TAILQ_EMPTY, and here the queue
macro is do protected by a spinlock

#2 Updated by masterblaster over 3 years ago

A further possible source of problems:

in sys/bus/cam/cam_xpt.c:camisr() a TAILQ_REMOVE is done before CAM_SIM_LOCK(sim).

With the attached patch it's now running stable since a week.
As i'm new to dfly kernel, could please someone at least say if these changes
are meaningful at all?

#3 Updated by alexh over 3 years ago

Not sure, in principle the changes look okish. I'd be willing to commit them to
master, but no MFC.

Thanks,
Alex

#4 Updated by masterblaster over 3 years ago

Nope, still same problem after 6 days... (sic)
Anyway, if the changes make sense, i would be glad if the are committed :)

What really disturbing is the log entry: 'ahci0.1: disk_rw: unknown state 6'
immediately before 'bad link elm'.
I changed the 'kprintf' at that point in code with 'panic' in the hope to get a
meaningful backtrace and eventually a core dump.

Thanks for your attention, Alex!

#5 Updated by masterblaster about 3 years ago

After a stress2 disk test, got panic below (of course no kernel dump).
Timeout msg look like a well known problem (logged either by FreeBSD and Linux
kernel msg) of WD Raptor hard disk series (in my case a WD3000HLFS).

I'm migrating right now to an SSD installation, i'm looking forward to see if
it's HD fault...

Anyway it would be nice to force a kernel dump (as a list elem contention
problem is still present); but here the point of failure is the very piece of
hardware where the dump should be written... Should i dump, say, to an usb
stick? Any ideas?

ahci0.1: CMD TIMEOUT state=5 slot=20
cmd-reg 0xd417<CR,FR,FRE,POD,SUD,ST>
sactive=00100000 active=00000000 expired=00000000
sact=00000000 ci=00000000
STS=40
ahci0.1: disk_rw: timeout
ahci0.1: CMD TIMEOUT state=5 slot=9
cmd-reg 0xc917<CR,FR,FRE,POD,SUD,ST>
sactive=00000200 active=00000000 expired=00000000
sact=00000000 ci=00000000
STS=40
ahci0.1: disk_rw: timeout
[diagnostic] cache_lock: blocked on 0xffffffe04d25bca8 "stressX"
[diagnostic] cache_lock: unblocked stressX after 1 secs
[diagnostic] cache_lock: blocked on 0xffffffe04d25bca8[diagnostic] cache_lock:
blocked on 0xffffffe04d25bca8 "stressX"
"stressX"
[diagnostic] cache_lock: blocked on 0xffffffe04d25bca8 "stressX"
[diagnostic] cache_lock: unblocked stressX after 0 secs
[diagnostic] cache_lock: unblocked stressX after 0 secs
[diagnostic] cache_lock: unblocked stressX after 0 secs
ahci0.1: CMD TIMEOUT state=5 slot=28
cmd-reg 0xdc17<CR,FR,FRE,POD,SUD,ST>
sactive=10000000 active=00000000 expired=00000000
sact=00000000 ci=00000000
STS=40
ahci0.1: disk_rw: timeout
ahci0.1: CMD TIMEOUT state=5 slot=28
cmd-reg 0xdc17<CR,FR,FRE,POD,SUD,ST>
sactive=10000000 active=00000000 expired=00000000
sact=00000000 ci=00000000
STS=40
ahci0.1: disk_rw: timeout
[diagnostic] cache_lock: blocked on 0xffffffe04d25bca8[diagnostic] cache_lock:
blocked on 0xffffffe04d25bca8 "stressX"
"stressX"
[diagnostic] cache_lock: unblocked stressX after 0 secs
[diagnostic] cache_lock: unblocked stressX after 0 secs
ahci0.1: CMD TIMEOUT state=5 slot=6
cmd-reg 0xc617<CR,FR,FRE,POD,SUD,ST>
sactive=00000040 active=00000000 expired=00000000
sact=00000000 ci=00000000
STS=40
ahci0.1: disk_rw: timeout
ahci0.1: CMD TIMEOUT state=5 slot=0
cmd-reg 0xc017<CR,FR,FRE,POD,SUD,ST>
sactive=00000001 active=00000000 expired=00000000
sact=00000000 ci=00000000
STS=40
ahci0.1: disk_rw: timeout
ahci0.1: disk_rw: unknown state 6
ahci0.1: disk_rw: unknown state 6
panic: Bad link elm 0xffffffe01e2faa78 next->prev != elm
cpuid = 0
Trace beginning at frame 0xffffffe01e10fa00
panic() at panic+0x1ed
panic() at panic+0x1ed
camisr_runqueue() at camisr_runqueue+0x87
swi_cambio() at swi_cambio+0x167
ithread_handler() at ithread_handler+0x1d4
boot() called on cpu#0
Uptime: 8h50m24s
Physical memory: 2019 MB
Dumping 1064 MB:spin_lock: 0xffffffe005e0b5a0, indefinite wait!
panic: Bad link elm 0xffffffe01e2faa78 next->prev != elm
cpuid = 0
boot() called on cpu#0

#6 Updated by alexh about 3 years ago

If you put the swap partition (or for that matter the dumpdev) on a disk with a
different driver, i.e. simple nata, you'll be able to get a dump.

Cheers,
Alex

#7 Updated by masterblaster about 3 years ago

After lots of tries, got usable kdump from a x86_64 virtualbox instance.
System fresh updated up to 481b38a3c400eabd7f5cce81a3013178a8643271 (full
rebuild world and kernel) without the patches attached to this ticket.

https://84.73.12.43/core1.tar.xz (kern and vmcore)

#8 Updated by masterblaster about 3 years ago

Got fresh, on real hardware (and usable!) kdump:
https://84.73.12.43/crash/core18.tar.xz

Same problem, different place:

#0 _get_mycpu () at ./machine/thread.h:69
#1 md_dumpsys (di=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/dump_machdep.c:263
#2 0xffffffff802c4b75 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:925
#3 0xffffffff802c51db in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xffffffff802c5484 in panic (
fmt=0xffffffff817b03e0 "Bad tailq NEXT(%p->tqh_last) != NULL")
at /usr/src/sys/kern/kern_shutdown.c:831
#5 0xffffffff817aa67a in ahci_put_ccb (ccb=0xffffffe01f272a80)
at /usr/src/sys/dev/disk/ahci/ahci.c:3122
#6 0xffffffff817aa6e5 in ahci_ata_put_xfer (xa=<unavailable>)
at /usr/src/sys/dev/disk/ahci/ahci.c:3494
#7 0xffffffff817a762f in ahci_ata_complete_disk_rw (xa=0xffffffe01f272a80)
at /usr/src/sys/dev/disk/ahci/ahci_cam.c:1719
#8 0xffffffff817ab49f in ahci_ata_cmd_done (ccb=0xffffffe01f272a80)
at /usr/src/sys/dev/disk/ahci/ahci.c:3564
#9 0xffffffff817ae08b in ahci_port_intr (ap=0xffffffe01f25c638,
blockable=<optimized out>) at /usr/src/sys/dev/disk/ahci/ahci.c:3010
#10 0xffffffff817ae2a1 in ahci_intr (arg=<optimized out>)
at /usr/src/sys/dev/disk/ahci/ahci.c:2352
#11 0xffffffff80294b24 in ithread_handler (arg=<optimized out>)

#9 Updated by dillon about 3 years ago

Ok, a couple of things.

0003 patch - there is no race condition here that I can see in camisr(). It
moves cam_simq to a local on-stack queue with the lock held. Once on the local
queue it can safely remove the sims from the queue without locking, but of
course must still obtain the CAM_SIM_LOCK() to process the sim.

0002 patch - This primarily exists to handle synchronous cases and the
TAILQ_EMPTY check does not require locking.

0001 patch - This is a very interesting patch. I don't think it fixes any bugs,
though, it's just an optimization. Is that correct or does it fix bugs? I can
commit it for its optimization I guess.

Now on to the bugs. I've gone through the code and I see one possible case in
the timeout handling. If a callout has a high latency it is possible it can
wind up running long after it's been callout_stop()'d by the driver. That is,
the callout function itself can be in-progress and stuck waiting for a lock at
the time the frontend (which owns the lock) calls callout_stop(). I think state
can get confused if this happens.

I have made some commits to try to address this potential issue. I also changed
the 'unknown state' kprintf into a panic... see if you can get a backtrace when
it occurs. State 6 is a CCB that is on the free list, meaning that all command
processing should already have been done. Whatever corruption is causing this
double-completion is also probably causing the list corruption.

I have noticed an occasional AHCI timeout while testing high latencies on our
48-core box. fsstress might introduce similar issues though it might disappear
if you update to the latest master (which you need to get the committed patches
anyway). The issues were related to heavy MP contention on a global pmap
spinlock which has since been removed.

-Matt

#10 Updated by masterblaster about 3 years ago

Unfortunately another kdump despite callout_stop_sync() modifications (few hours
after installkernel): https://84.73.12.43/crash/core19.tar.xz

As 'crash test' i'm running a 'git svn clone' of lyx svn repo (which takes
looong time)

(kgdb) bt
#0 _get_mycpu () at ./machine/thread.h:69
#1 md_dumpsys (di=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/dump_machdep.c:263
#2 0xffffffff802c4c75 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:925
#3 0xffffffff802c52db in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xffffffff802c5584 in panic (
fmt=0xffffffff817b1310 "assertion \"%s\" failed in %s at %s:%u")
at /usr/src/sys/kern/kern_shutdown.c:831
#5 0xffffffff817ac553 in ahci_ata_cmd_done (ccb=0xffffffe01f2726e0)
at /usr/src/sys/dev/disk/ahci/ahci.c:3561
#6 0xffffffff817af14b in ahci_port_intr (ap=0xffffffe01f25c638,
blockable=<optimized out>) at /usr/src/sys/dev/disk/ahci/ahci.c:3010
#7 0xffffffff817af361 in ahci_intr (arg=<optimized out>)
at /usr/src/sys/dev/disk/ahci/ahci.c:2352
#8 0xffffffff80294c14 in ithread_handler (arg=<optimized out>)
at /usr/src/sys/kern/kern_intr.c:837
#9 0xffffffff802d17ad in lwkt_deschedule_self (td=<unavailable>)
at /usr/src/sys/kern/lwkt_thread.c:355
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(kgdb) p *(struct ata_xfer *) ccb->ccb_xa
$1 = {fis = 0x4015630020618027, rfis = {type = 9 '\t', flags = 0 '\000',
status = 0 '\000', error = 0 '\000', lba_low = 216 '\330',
lba_mid = 0 '\000', lba_high = 0 '\000', device = 0 '\000',
lba_low_exp = 0 '\000', lba_mid_exp = 0 '\000', lba_high_exp = 0 '\000',
reserved0 = 0 '\000', sector_count = 0 '\000',
sector_count_exp = 0 '\000', reserved1 = 0 '\000', reserved2 = 0 '\000',
reserved3 = 0 '\000', reserved4 = 0 '\000', reserved5 = 0 '\000',
reserved6 = 0 '\000'}, packetcmd = 0x0, tag = 0 '\000', data = 0x0,
datalen = 0, resid = 0, complete = 0, timeout = 0, flags = 0, state = 0,
atascsi_private = 0x0, at = 0x0}

(kgdb) p *(*(struct ata_xfer *) ccb->ccb_xa).fis
Cannot access memory at address 0x4015630020618027

and fis is screwed up too...

#11 Updated by masterblaster about 3 years ago

  • Description updated (diff)

I got master (without the three patches attached, at v2.13.0.327.g739be-DEVELOPMENT) panic'd with http://bugs.dragonflybsd.org/pastes/11 (unfortunately no core). Btw, savecore device is a usb stick.
Crash point is now different because of commit 46528d338822ba0b7e4becd8f66c04e2d9698b24

A core (same crash point) of previous version v2.13.0.259.g56693-DEVELOPMENT is available at:
https://84.73.12.43/crash/core21.txz

#12 Updated by masterblaster almost 3 years ago

Fresh (and finally usable) core: https://84.73.12.43/crash/core22.txz

At v2.13.0.357.gcdb8af with patch (just to better isolate the problem):

diff --git a/sys/dev/disk/ahci/ahci.c b/sys/dev/disk/ahci/ahci.c
index 2752c6c..d9c4733 100644
--- a/sys/dev/disk/ahci/ahci.c
+++ b/sys/dev/disk/ahci/ahci.c
@@ -3558,7 +3558,8 @@ ahci_ata_cmd_done(struct ahci_ccb *ccb)
xa->flags &= ~(ATA_F_TIMEOUT_DESIRED | ATA_F_TIMEOUT_EXPIRED);
ccb->ccb_port->ap_expired &= ~(1 << ccb->ccb_slot);

- KKASSERT(xa->state != ATA_S_ONCHIP && xa->state != ATA_S_PUT);
+ KKASSERT(xa->state != ATA_S_ONCHIP);
+ KKASSERT(xa->state != ATA_S_PUT);
ahci_unload_prdt(ccb);

if (xa->state != ATA_S_TIMEOUT)

#13 Updated by masterblaster almost 3 years ago

https://84.73.12.43/crash/core23.txz is quite big (244M) but shows clearly data in the call level where KKASSERT(xa->state != ATA_S_PUT) fails (patch in previous comment applies too).

Core was manually generated after dropping in ddb and executing 'next', as DDB_UNATTENDED led to no dump.
Running v2.13.0.437.g2b9050-DEVELOPMENT.

(kgdb) bt

#0 _get_mycpu () at ./machine/thread.h:69
#1 md_dumpsys (di=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/dump_machdep.c:263
#2 0xffffffff802c5485 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:925
#3 0xffffffff802c5aeb in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xffffffff802c5da2 in panic (
fmt=0xffffffff805891b8 "trap: extra tokens held after trap! %ld/%ld")
at /usr/src/sys/kern/kern_shutdown.c:831
#5 0xffffffff804e5ba5 in trap (frame=0xffffffe0416dba08)
at /usr/src/sys/platform/pc64/x86_64/trap.c:814
#6 0xffffffff804de05e in calltrap ()
at /usr/src/sys/platform/pc64/x86_64/exception.S:187
#7 0xffffffff804dfa0f in breakpoint () at ./cpu/cpufunc.h:73
#8 Debugger (msg=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/db_interface.c:361
#9 0xffffffff802c5d82 in panic (
fmt=0xffffffff81823380 "assertion \"%s\" failed in %s at %s:%u")
at /usr/src/sys/kern/kern_shutdown.c:822
#10 0xffffffff8181e57f in ahci_ata_cmd_done (ccb=0xffffffe041e4ceb0)
at /usr/src/sys/dev/disk/ahci/ahci.c:3562
#11 0xffffffff8182117b in ahci_port_intr (ap=0xffffffe041e38f38,
blockable=<optimized out>) at /usr/src/sys/dev/disk/ahci/ahci.c:3010
#12 0xffffffff81821391 in ahci_intr (arg=<optimized out>)
at /usr/src/sys/dev/disk/ahci/ahci.c:2352
#13 0xffffffff80295174 in ithread_handler (arg=<optimized out>)
at /usr/src/sys/kern/kern_intr.c:837
#14 0xffffffff802d2cad in lwkt_deschedule_self (td=0x0)
at /usr/src/sys/kern/lwkt_thread.c:357

Also available in: Atom PDF