Project

General

Profile

Actions

Bug #2402

closed

Showstopper panics for Release 3.2

Added by marino over 11 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Other
Target version:
Start date:
08/15/2012
Due date:
% Done:

0%

Estimated time:

Description

This is the list of panics we've been accumulating. It's particularly hard for i386 and UFS.
This is a good list of items to be fixed before next release.

#2296: panic: assertion "m->wire_count > 0" failed in pmap_unwire_pte at /usr/src/sys/platform/pc32/i386/pmap.c:1091
core available (~marino/crash, ~thomas/crash
(carried over from 3.0.1, 3.0.2, 3.0.3 showstopper lists)
#2364: panic: lockmgr: locking against myself
core available (~marino/crash)
#2374: Panic where softdep_update_inodeblock() called bwrite() with a NULL buffer
core available (~marino/crash) uploaded today
#2374: panic: flush_pagedep_deps: MKDIR_BODY
core available (~marino/crash)
#2370: panic: ffs_valloc: dup alloc
core available (~marino/crash)
#2350 panic: assertion "m->flags & PG_BUSY" failed in vm_page_protect at /usr/src/sys/vm/vm_page.h:532
core available (~pavalos/crash)

Leftover from 3.0.3 showstopper:
#2353 panic: assertion "gd->gd_spinlocks_wr == 0" failed in bsd4_schedulerclock
core available (~jaydg/crash)
#2388 panic: lockmgr: LK_RELEASE: no lock held
No core.
#2399 Panic on lwkt_reltoken from vm_mmap
core available (limited use)

Leftover from 3.0.1 showstopper:
#2284 panic: general protection fault (3.0 showstopper)
core available on ylem/var/crash, request to put on leaf didn't happen (?)

Other panics:
#2352 panic: Bad link elm 0xffffffe0a3775670 next->prev != elm
core available (~jaydg/crash)
#2369 panic: Bad link elm 0xffffffe07edf6068 next->prev != elm
core available (~jaydg/crash)
#2355 panic: rtrequest1_msghandler: rtrequest table error was cpu4, err 17
core available (~jaydg/crash)
#2083 panic: zone: entry not free
core might be available
#2358 panic: hammer: insufficient undo FIFO space!
NO CORE
#2345 panic: assertion "len <= nmp->nm_size" failed in nfs_writerpc_bio at ....
NO CORE
#2300 EHCI module unload panic
Supposed core availble on request


Related issues 19 (9 open10 closed)

Related to Bug #2364: panic: lockmgr: locking against myselfClosedvsrinivas05/09/2012

Actions
Related to Bug #2296: panic: assertion "m->wire_count > 0" failedIn Progress02/01/2012

Actions
Related to Bug #2374: softupdates locking bugResolved05/23/2012

Actions
Related to Bug #2370: panic: ffs_valloc: dup allocNewvsrinivas05/16/2012

Actions
Related to Bug #2350: vm panic when fs is fullClosed

Actions
Related to Bug #2353: panic: assertion "gd->gd_spinlocks_wr == 0" failed in bsd4_schedulerclockIn Progressalexh04/27/2012

Actions
Related to Bug #2388: ntfs mount problems (dfly 3.0.2 + ntfsprogs)Closedtuxillo06/18/2012

Actions
Related to Bug #2399: DFBSD v3.1.0.1249.ge27e67 - Panic on lwkt_reltoken from vm_mmapClosed08/10/2012

Actions
Related to Bug #2284: another sysctl panicClosed

Actions
Related to Bug #2352: panic: Bad link elm 0xffffffe0a3775670 next->prev != elmClosed04/27/2012

Actions
Related to Bug #2369: panic: Bad link elm 0xffffffe07edf6068 next->prev != elmNew05/16/2012

Actions
Related to Bug #2355: panic: rtrequest1_msghandler: rtrequest table error was cpu4, err 17Closed04/27/2012

Actions
Related to Bug #2358: DFBSD v3.0.2.32.g928ca - panic: hammer: insufficient undo FIFO space!In Progresstuxillo04/29/2012

Actions
Related to Bug #2345: DFBSD v3.1.0.457.gd679f - NFS panic on diskless stationIn Progress04/07/2012

Actions
Related to Bug #2300: DFBSD v3.0.0.6.gbd613-RELEASE - EHCI module unload panicClosedtuxillo02/07/2012

Actions
Related to Bug #2084: DFBSD v2.11.0.242.g4d317 - panic: zone: entry not freeNew

Actions
Related to Bug #2411: panic: lwkt_send_ipiq: TOO HEAVILY NESTED!Resolved08/25/2012

Actions
Related to Bug #2436: panic: assertion "lp->lwp_qcpu == dd->cpuid" failed in dfly_acquire_curprocNew10/21/2012

Actions
Related to Bug #2297: strange NFS (client) error messages / problemsNew02/02/2012

Actions
Actions #1

Updated by marino over 11 years ago

Typo: Issue 2083 --> #2084 panic: zone: entry not free

Actions #2

Updated by dillon over 11 years ago

Here's a patch to try to hopefully help or narrow down some of the softupdates issues. I found two major issues perusing the softupdates code.

First, sema_get() and sema_release() are not MP safe when called without an interlock.

Second, getdirtybuf() improperly retries after release/reacquiring &lk. If this function cannot obtain the buffer lock prior to releasing &lk it MUST return failure. The blocking buffer lock it obtains after releasing &lk is simply so the caller's retry loop doesn't live-lock... even if that second lock attempt succeeds the buffer itself may no longer be legally associated with the softdep work item because the instant &lk is released that work item can get ripped up. The buffer cache pointer itself is type-stable, but not work-item stable.

http://apollo.backplane.com/DFlyMisc/softdep01.patch

Actions #3

Updated by dillon over 11 years ago

Here is a second patch to hopefully fix the list-related panics in exit. What I believe is happening is that a threaded program is wait*()'ing for exiting children from several threads at once. This can race inside kern_wait() due to sub-tokens blocking and breaking q->p_token (on the parent). The candidate children have to be further serialized, plus we also have to double-check that the conditions are still valid and the child is still associated with the same parent.

http://apollo.backplane.com/DFlyMisc/exit02.patch

Actions #4

Updated by tuxillo almost 2 years ago

  • Description updated (diff)
  • Category set to Other
  • Status changed from New to Closed
  • Assignee set to tuxillo

3.2 was released long ago.

Actions

Also available in: Atom PDF