Bug #2336

3.0.3 catchall

Added by vsrinivas over 2 years ago. Updated almost 2 years ago.

Status:ResolvedStart date:03/26/2012
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

This bug tracks outstanding work for the next release on the 3 -STABLE line.

Outstanding MFCs:

68cc29492751172e9897a90c303b1c7fbb18112a : MSDOSFS 32/64-bit issue (Issue 2334)
61f40eab02d073fff26f1d1cc78d5693c3bcea53 : libc: Don't allow stuck syslog to screw up system
afea4d5a3cf396976e5b778f8477d972e8c29735 : AMD Errata 721 workaround
aad00981445a4e7c60d89f07717d431f8a228e39 : Incorrect HAMMER unlock/lock sequence
3d102df7db4f442f9921757788d791d3ec0ebca0 : AHCI missed interrupts bugfix #1
d16d3400e7cd9e7c897c37abab0b8f2668fcc5c0 : AHCI too eagerly unlocks;
6bac9ae49e088aafa7604314335e9822676799bb : SILI missed interrupts bugfix

Outstanding bugs:
(leftover from 3.0.2) Bug 2296: panic: assertion "m->wire_count > 0" failed i386
Bug 2291: UFS/softdep panic in lwkt_remove_tdallq


Related issues

Related to Bug #2291: Panic in lwkt_remove_tdallq Closed 01/26/2012
Related to Bug #2296: panic: assertion "m->wire_count > 0" failed In Progress 02/01/2012
Related to Bug #2338: vm_pageout_active_scan panic Resolved 03/26/2012
Related to Bug #2335: Markus rtld ghc patch only on master? Closed 03/25/2012
Related to Bug #2349: fork() after signal(SIGCHLD, SIG_IGN) produces zombies Closed 04/17/2012
Related to Bug #2364: panic: lockmgr: locking against myself Closed 05/09/2012
Related to Bug #2353: panic: assertion "gd->gd_spinlocks_wr == 0" failed in bsd... In Progress 04/27/2012
Related to Bug #2374: softupdates locking bug Resolved 05/23/2012
Related to Bug #2388: ntfs mount problems (dfly 3.0.2 + ntfsprogs) Closed 06/18/2012
Related to Bug #2399: DFBSD v3.1.0.1249.ge27e67 - Panic on lwkt_reltoken from v... Closed 08/10/2012

History

#1 Updated by vsrinivas over 2 years ago

Potential MFCs:

865b0477b2adbf8798d0f021782c2a9b9b61a006 : Don't process invalid SACKs
928c329102e5f0067dc0c50cf6903874efa3639e : tcp: Properly update tcps_rttcantdetect
* RTLD fix that allows GHC to run (Issue 2335)

Other Bugs (w/o issue #s):
* vm_pageout_scan_active() panic { found PQ_ACTIVE page w/o vm_object, but w/ pindex }
* ffs_valloc duplicate allocation seen (!);

#2 Updated by vsrinivas over 2 years ago

865b0477b2adbf8798d0f021782c2a9b9b61a006 (Don't process invalid SACKs): MFCed.
928c329102e5f0067dc0c50cf6903874efa3639e (tcps_rttcantdetect): doesn't need to be, per sephe@.
c6f37fc2ab41918c5d344ea3086df799db85bb34 (AMD #721 workaround): MFCed
83d890e1fd1d3be68acd0eaf860bda2db05872c9 (syslogd wedge fix): MFCed
82e60c53f4889f9499f9061bfbccffc35b78494f (MSDOSFS -1 32/64-bit): MFCed

Holding off on the AHCI and SILI fixes for a few more days.

#3 Updated by vsrinivas over 2 years ago

AHCI/SILI fixes have been MFC-ed. Looks like everything on the MFC list has been taken care of.

Issue 2291 has a fix in -master; will be MFCed after stewing for some time (more testing is encouraged!).

This leaves issue 2296 ('i386 pmap wire count panic') and 2338 ('vm_pageout_active_scan panic').

There are also open issues w/o tickets:
* UFS panic running fsstress; sees a VCHR vnode in ffs_sync_scan1 with a null inode.
* amd64 panic in pmap_growkernel, when interrupt reserve was drained and a page table needed allocation.
* ffs duplicate allocation panic

#4 Updated by vsrinivas over 2 years ago

MFC list:

For bug 2291: (softupdates lock panics)
8e90f899fdf61479c5e76faa87e7ff716982ed08 Replace softupdates critical sections w/ lockmgr
24624a1562837ae797e4c1b05689f6f5b56006d9 Take softupdates lock in softdep io init path
For bug 2340: (UFS fsstress panic)
609f61878d1378c3d04602cf1e581a6f57dfae47 Do not reference NULL inodes in ffs_sync_scan1
[ MFC-ready ]

6562e2d87bfc5428fec4426aa343ea315ab88941 Signal assertion panic
[Should use a few more days of testing in -master]
39312e2a708905230d943ab98542ec8d6fabb22e kernel - Do not use M_USE_INTERRUPT_RESERVE for M_NOWAIT
[Should use a few more days of testing in -master]

For bug 2341: (HAMMER multivolume panic)
41a8e51772c65775f6b34c29ececa01178e633b9 Fixes an missed check in the RB code

#5 Updated by vsrinivas over 2 years ago

Bug fixes to 2291 MFCed.

Status ===

* Bug 2296 ('pmap wire_count assertion in i386 vm') and 2338 (panic in vm_pageout_scan_active) remain open.

* ffs duplicate allocation bug remains open.

* Bug 2341 ('HAMMER multivolume panic') affects -stable branch, there is a patch in -master (41a8e51772c65775f6b34c29ececa01178e633b9). It should be MFCed after a few days/more testing

* Commit 6562e2d87bfc5428fec4426aa343ea315ab88941 in -master fixes a panic triggered by the signals code. After a few days of testing, it should be merged.

* Commit 39312e2a708905230d943ab98542ec8d6fabb22e in -master fixes an issue where M_NOWAIT used the interrupt reserve, allowing mbuf allocations to be exhausted. Should be merged after a few days.

#6 Updated by vsrinivas over 2 years ago

Signals fix and M_NOWAIT fix have been MFCed.

Status ===

* Bug 2296 ('pmap wire_count assertion in i386 VM') is still open
* Bug 2338 ('vm_pageout_scan_active sees null object') is still open. Progress is being made; it appears vm_page_unwire() can insert a page with a null object onto the active/inactive queues.
* Bug 2341 remains open. There is a patch, committed to -master (41a8e51772c65775f6b34c29ececa01178e633b9), that should be MFCed after more testing.

#7 Updated by vsrinivas over 2 years ago

Status ===

--- open:

* Bug 2296 ('pmap wire_count assertion in i386 VM') is still open, but dillon has started working on it after recently reproducing it on pkgbox32
* Bug 2349 ('Don't create zombies when SIGCHLD is sig_ign-ed') was linked because the bug affects 3.0 and there is a patchset.

--- To MFC:

* Bug 2338 ('vm_pageout_scan active sees null object') is still open, but a fix has been commited to -master that should resolve it (19cd98ea71f56117435402874beedbdf6d399b52). After more digestion, it should be MFC-ed.
* Bug 2341 ('HAMMER rb comparison in multivolume incorrect') remains open. There is a patch, committed to -master (41a8e51772c65775f6b34c29ececa01178e633b9), that should be MFCed.

---
After that range of fixes, perhaps there should be a 3.0.3 release? There have been a lot of nice fixes into the DragonFly_RELEASE_3_0 branch since the 3.0.2 release.

None are absolutely required to successfully install, except in rare cases (specifically -- soft updates fixes; there was one report of panics during install in the softdep code). But the fixes are all 'nice to have', I think, particularly for sites that run only numbered releases rather than the head of DragonFly_RELEASE_3_0 branch.

--- Extra:
There has now been a report of corruption on a HAMMER volume; specifically of hex '0's being appended to a file after a clean unmount; also of file changes long before shutdown long surviving a reboot. This should be investigated in -master. Till then I think a 3.0.3 release should be blocked, as it gained the clustering fixes from -master.

There has been a report of a panic in soft-updates also, via the fsync path. Some more work there is called for; it is also time to eliminate the MPlock in soft updates.

#8 Updated by vsrinivas over 2 years ago

* A commit (799ea500fc9a83af3435056a7fbc0a1aafb6bfc4) was pushed to -master to remove the mplock from ffs_softdep and correctly synchronize on the ffs_softdep lock only in softdep_process_worklist path. Fixes a panic I saw when running fsstress on UFS; the patch should be the last in the ffs_softdep locking fixes. MFC after a bit and some testing on -stable.

* The zombies commit (19af187033dafb57f0739a8b0b21f74d4e5373e8) was pushed to -master, should close bug 2349. MFC after a bit.

#9 Updated by vsrinivas about 2 years ago

Commits 799ea500fc9a83af3435056a7fbc0a1aafb6bfc4 (lock fixes to softdep), 19af187033dafb57f0739a8b0b21f74d4e5373e8 (zombie SIGIGN issue), 19cd98ea71f56117435402874beedbdf6d399b52 (vm page scan null objects), and 41a8e51772c65775f6b34c29ececa01178e633b9 (hammer multivolume assertion failure) have been MFCed.

Bug 2338 is probably resolved at this point.

----
To MFC:

b642a6c1f5bbb295e29522d99c65038f459288ac cluster_write conversion to delayed writes
66030e2b4635359f2d84f23298c9d8ce1e6af5da Only B_CLUSTEROK okay-to-cluster (64k) bufs in HAMMER

The HAMMER zero-append bug may solved by the above two commits in -master. They will also close a space-usage issue and performance regression in HAMMER in the head of the -stable branch (but not in 3.0.2). These should be MFC-ed soon.

Open bugs:
Bug 2296 (i386 pmap assertion) remains open.

#10 Updated by vsrinivas about 2 years ago

b642a6c1f5bbb295e29522d99c65038f459288ac and 66030e2b4635359f2d84f23298c9d8ce1e6af5da were merged, resolving HAMMER cluster issues.

The only open bug is the i386 pmap assertion (Bug 2296).

#11 Updated by marino about 2 years ago

I don't know for sure that bug# 2364 affects upcoming 3.0.3, but if it does, it needs addressed before another release of the 3.0 branch.

Added to "related to" section.

#12 Updated by vsrinivas about 2 years ago

Bug 2364 does affect the 3.0 branch. It does need to be addressed as well.

#13 Updated by vsrinivas about 2 years ago

Bug 2353 affects the 3.0 branch also; it has been root-cased, but is pending a fix.

#14 Updated by elekktretterr about 2 years ago

> Bug 2291: UFS/softdep panic in lwkt_remove_tdallq

When I upgraded to 3.0.2, I got that panic. Then when installed master,
the panic was gone, but instead I've got a deadlock (no trace).

Petr

#15 Updated by vsrinivas about 2 years ago

Have created a new bug (2374) to track softupdates-related issues; the issues should be very similar between -master and -stable at the minute, there is only one softupdates commit in -master that is not in stable, and it should not have a material impact here. I was delaying MFCing that panic till there was more testing, but have been a bit bandwidth-starved off-late.

#16 Updated by vsrinivas almost 2 years ago

  • Status changed from New to Resolved

3.0.3 was released at the end of August/early September 2012.

Also available in: Atom PDF