Bug #2402

Showstopper panics for Release 3.2

Added by marino over 2 years ago. Updated almost 2 years ago.

Status:NewStart date:08/15/2012
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

This is the list of panics we've been accumulating. It's particularly hard for i386 and UFS.
This is a good list of items to be fixed before next release.

#2296: panic: assertion "m->wire_count > 0" failed in pmap_unwire_pte at /usr/src/sys/platform/pc32/i386/pmap.c:1091
core available (~marino/crash, ~thomas/crash
(carried over from 3.0.1, 3.0.2, 3.0.3 showstopper lists)
#2364: panic: lockmgr: locking against myself
core available (~marino/crash)
#2374: Panic where softdep_update_inodeblock() called bwrite() with a NULL buffer
core available (~marino/crash) uploaded today
#2374: panic: flush_pagedep_deps: MKDIR_BODY
core available (~marino/crash)
#2370: panic: ffs_valloc: dup alloc
core available (~marino/crash)
#2350 panic: assertion "m->flags & PG_BUSY" failed in vm_page_protect at /usr/src/sys/vm/vm_page.h:532
core available (~pavalos/crash)

Leftover from 3.0.3 showstopper:
#2353 panic: assertion "gd->gd_spinlocks_wr == 0" failed in bsd4_schedulerclock
core available (~jaydg/crash)
#2388 panic: lockmgr: LK_RELEASE: no lock held
No core.
#2399 Panic on lwkt_reltoken from vm_mmap
core available (limited use)

Leftover from 3.0.1 showstopper:
#2284 panic: general protection fault (3.0 showstopper)
core available on ylem/var/crash, request to put on leaf didn't happen (?)

Other panics:
#2352 panic: Bad link elm 0xffffffe0a3775670 next->prev != elm
core available (~jaydg/crash)
#2369 panic: Bad link elm 0xffffffe07edf6068 next->prev != elm
core available (~jaydg/crash)
#2355 panic: rtrequest1_msghandler: rtrequest table error was cpu4, err 17
core available (~jaydg/crash)
#2083 panic: zone: entry not free
core might be available
#2358 panic: hammer: insufficient undo FIFO space!
NO CORE
#2345 panic: assertion "len <= nmp->nm_size" failed in nfs_writerpc_bio at ....
NO CORE
#2300 EHCI module unload panic
Supposed core availble on request


Related issues

Related to Bug #2364: panic: lockmgr: locking against myself Closed 05/09/2012
Related to Bug #2296: panic: assertion "m->wire_count > 0" failed In Progress 02/01/2012
Related to Bug #2374: softupdates locking bug Resolved 05/23/2012
Related to Bug #2370: panic: ffs_valloc: dup alloc New 05/16/2012
Related to Bug #2350: vm panic when fs is full Closed
Related to Bug #2353: panic: assertion "gd->gd_spinlocks_wr == 0" failed in bsd... In Progress 04/27/2012
Related to Bug #2388: ntfs mount problems (dfly 3.0.2 + ntfsprogs) Closed 06/18/2012
Related to Bug #2399: DFBSD v3.1.0.1249.ge27e67 - Panic on lwkt_reltoken from v... Closed 08/10/2012
Related to Bug #2284: another sysctl panic Closed
Related to Bug #2352: panic: Bad link elm 0xffffffe0a3775670 next->prev != elm Closed 04/27/2012
Related to Bug #2369: panic: Bad link elm 0xffffffe07edf6068 next->prev != elm New 05/16/2012
Related to Bug #2355: panic: rtrequest1_msghandler: rtrequest table error was c... Closed 04/27/2012
Related to Bug #2358: DFBSD v3.0.2.32.g928ca - panic: hammer: insufficient undo... In Progress 04/29/2012
Related to Bug #2345: DFBSD v3.1.0.457.gd679f - NFS panic on diskless station In Progress 04/07/2012
Related to Bug #2300: DFBSD v3.0.0.6.gbd613-RELEASE - EHCI module unload panic Feedback 02/07/2012
Related to Bug #2084: DFBSD v2.11.0.242.g4d317 - panic: zone: entry not free New
Related to Bug #2411: panic: lwkt_send_ipiq: TOO HEAVILY NESTED! Resolved 08/25/2012
Related to Bug #2436: panic: assertion "lp->lwp_qcpu == dd->cpuid" failed in df... New 10/21/2012
Related to Bug #2297: strange NFS (client) error messages / problems New 02/02/2012

History

#1 Updated by marino over 2 years ago

Typo: Issue 2083 --> #2084 panic: zone: entry not free

#2 Updated by dillon over 2 years ago

Here's a patch to try to hopefully help or narrow down some of the softupdates issues. I found two major issues perusing the softupdates code.

First, sema_get() and sema_release() are not MP safe when called without an interlock.

Second, getdirtybuf() improperly retries after release/reacquiring &lk. If this function cannot obtain the buffer lock prior to releasing &lk it MUST return failure. The blocking buffer lock it obtains after releasing &lk is simply so the caller's retry loop doesn't live-lock... even if that second lock attempt succeeds the buffer itself may no longer be legally associated with the softdep work item because the instant &lk is released that work item can get ripped up. The buffer cache pointer itself is type-stable, but not work-item stable.

http://apollo.backplane.com/DFlyMisc/softdep01.patch

#3 Updated by dillon over 2 years ago

Here is a second patch to hopefully fix the list-related panics in exit. What I believe is happening is that a threaded program is wait*()'ing for exiting children from several threads at once. This can race inside kern_wait() due to sub-tokens blocking and breaking q->p_token (on the parent). The candidate children have to be further serialized, plus we also have to double-check that the conditions are still valid and the child is still associated with the same parent.

http://apollo.backplane.com/DFlyMisc/exit02.patch

Also available in: Atom PDF