Bug #1298

system hang due to hammer problem?

Added by pgeorgi about 5 years ago. Updated almost 5 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

This morning, my server was unresponsive (no network traffic, no console
response beyond switching terminals and entering ddb. Regular keypresses,
eg. at the login prompt weren't registered).
The last messages in the log were

Feb 25 03:05:23 t-stueck kernel: HAMMER(backup): Critical error
inode=24949148432 while syncing inode
Feb 25 03:05:23 t-stueck kernel: HAMMER(backup): Forcing read-only mode
Feb 25 03:05:29 t-stueck kernel: pid 532 (hammer), uid 0: exited on signal 11
(core dumped)

The HAMMER(backup) lines were also on the console.

I don't have a crash dump, and the hammer coredump isn't very useful either
(no binary with symbols around).
The system ran a "2.3.0-development" version from ~1 week ago or so. I think no
hammer related changes appeared since then.

The problem seems to have happened during a "hammer prune", as started by the
nightly "hammer cleanup", as the daily run mail reported:

cleanup /backup/root/home - handle PFS #2 using /backup/pfs-snapshots/home
snapshots - run
prune - Segmentation fault (core dumped)

The security mail reported that PFS to be read-only, but it's not critical (it's
a pfs-slave to /home, on a different hammer partition than / and /home)

History

#1 Updated by dillon about 5 years ago

:New submission from Patrick Georgi <>:
:
:This morning, my server was unresponsive (no network traffic, no console
:response beyond switching terminals and entering ddb. Regular keypresses,
:eg. at the login prompt weren't registered).
:The last messages in the log were
:
:Feb 25 03:05:23 t-stueck kernel: HAMMER(backup): Critical error
:inode=3D24949148432 while syncing inode
:Feb 25 03:05:23 t-stueck kernel: HAMMER(backup): Forcing read-only mode
:Feb 25 03:05:29 t-stueck kernel: pid 532 (hammer), uid 0: exited on signal =
:11
:(core dumped)
:
:The HAMMER(backup) lines were also on the console.
:
:I don't have a crash dump, and the hammer coredump isn't very useful either
:(no binary with symbols around).
:The system ran a "2.3.0-development" version from ~1 week ago or so. I thin=
:k no
:hammer related changes appeared since then.
:
:The problem seems to have happened during a "hammer prune", as started by t=
:he
:nightly "hammer cleanup", as the daily run mail reported:
:
:cleanup /backup/root/home - handle PFS #2 using /backup/pfs-snapshots/ho=
:me
: snapshots - run
: prune - Segmentation fault (core dumped)
:
:The security mail reported that PFS to be read-only, but it's not critical =
:(it's
:a pfs-slave to /home, on a different hammer partition than / and /home)
:
:----------
:messages: 6248
:nosy: pgeorgi
:priority: bug
:status: unread
:title: system hang due to hammer problem?

I think the seg-fault in the utility is a different issue, and not
the cause. A critical error can only occur if either the I/O subsystem
reports an error or the filesystem fills up unexpectedly.

If your HAMMER filesystem is nowhere near full then it was probably
an I/O error. We can't really tell without more of the console history
and/or a core dump.

We'll keep an eye for other related bug reports.

What kind of storage system is connected to the machine? Internal
hard drive? SATA or PATA? You also might as well post the dmesg
output (post reboot), just so the information is recorded in the
bug report.

-Matt
Matthew Dillon
<>

#2 Updated by pgeorgi about 5 years ago

Two internal PATA disks in a VIA Epia C3 system.

atapci0: <VIA 8235 UDMA133 controller> port
0xbc00-0xbc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
ata0: <ATA channel 0> on atapci0
ad0: 381554MB <SAMSUNG HD400LD WQ100-15> at ata0-master UDMA100
ata1: <ATA channel 1> on atapci0
ad2: 381554MB <Seagate ST3400620A 3.AAF> at ata1-master UDMA100

#3 Updated by jdc about 5 years ago

Can you get SMART statistics for these using smartmontools? smartctl -a
/dev/ad0 and smartctl -a /dev/ad2 would be sufficient.

#4 Updated by corecode almost 5 years ago

could you confirm the cause of this issue?

Also available in: Atom PDF