Project

General

Profile

Bug #3055

HAMMER2 crash + LK_RELEASE fail

Added by arcade@b1t.name about 1 month ago. Updated 3 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
VFS subsystem
Target version:
-
Start date:
09/14/2017
Due date:
% Done:

0%


Description

This happens when 'cleanup' was too long ago.

kern_rename actually happened hours before before the actual crash.

core.txt.33 (178 KB) arcade@b1t.name, 09/14/2017 12:34 PM

core.txt.34 (179 KB) arcade@b1t.name, 09/18/2017 09:07 AM

core.txt.38 (206 KB) arcade@b1t.name, 09/23/2017 02:51 PM

core.txt.39 (282 KB) arcade@b1t.name, 10/12/2017 01:10 AM

core.txt.40 (185 KB) arcade@b1t.name, 10/12/2017 01:10 AM

core.txt.44 (213 KB) arcade@b1t.name, 10/17/2017 01:21 PM

History

#1 Updated by arcade@b1t.name about 1 month ago

PS: This is without latest kern_mutex.c changes.

#2 Updated by dillon about 1 month ago

  • Status changed from New to In Progress
  • Assignee set to dillon

Was the filesystem full at the time this ran? There is an error path that is not being checked properly in hammer2_chain_indirect_maintenance() for the situation where the filesystem has become full. I will commit error processing for that part of the code right now. If it still panics (verses just thowing an error on the kernel console), I'll need a backtrace from kgdb.

-Matt

#3 Updated by arcade@b1t.name about 1 month ago

Filesystem was close to 90% full with 20% being "jettissonable". I'm not sure about this one actually as FS was created more then a few weeks ago and can contain some older discrepancies. I can recreate FS from scratch and retest if that would be required.

Anyway double "hammer cleanup" makes FS stable again. Without cleanup host can't even boot due to problems writing data:

strategy_xop_write: error 32 loff=0000000057480000
strategy_xop_write: error 32 loff=00000000583f0000
strategy_xop_write: error 32 loff=000000005a680000
strategy_xop_write: error 32 loff=0000000063f50000
strategy_xop_write: error 32 loff=00000000677c0000
strategy_xop_write: error 32 loff=000000006aa90000
strategy_xop_write: error 32 loff=0000000073020000
strategy_xop_write: error 32 loff=0000000076cf0000
strategy_xop_write: error 32 loff=000000007f660000
strategy_xop_write: error 32 loff=0000000084430000
strategy_xop_write: error 32 loff=0000000087450000
strategy_xop_write: error 32 loff=0000000088e00000
strategy_xop_write: error 32 loff=0000000093ac0000
strategy_xop_write: error 32 loff=0000000093b90000

Or just crashes again.

#4 Updated by arcade@b1t.name about 1 month ago

Happened again. Alas, my kernel was built without DEBUG so kgdb output is pretty useless. Will try replicating one more time.

#5 Updated by arcade@b1t.name 27 days ago

Happened again. Pool was 93% full (9G free).

#6 Updated by arcade@b1t.name 8 days ago

A few more crashes. 7G free space reached...

#7 Updated by dillon 6 days ago

I think I see what is going on. I did not completely instrument error handling for some of these failure cases (when the media becomes full). Several calls to hammer2_chain_delete() are not processing the returned error code and that may be leading to these assertions.

I am working on instrumenting these and will commit an update this afternoon to master and release.

-Matt

#8 Updated by arcade@b1t.name 3 days ago

Another crash...

Also available in: Atom PDF