Project

General

Profile

Actions

Bug #1298

closed

system hang due to hammer problem?

Added by pgeorgi almost 16 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
VFS subsystem
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

This morning, my server was unresponsive (no network traffic, no console
response beyond switching terminals and entering ddb. Regular keypresses,
eg. at the login prompt weren't registered).
The last messages in the log were

Feb 25 03:05:23 t-stueck kernel: HAMMER: Critical error
inode=24949148432 while syncing inode
Feb 25 03:05:23 t-stueck kernel: HAMMER: Forcing read-only mode
Feb 25 03:05:29 t-stueck kernel: pid 532 (hammer), uid 0: exited on signal 11
(core dumped)

The HAMMER lines were also on the console.

I don't have a crash dump, and the hammer coredump isn't very useful either
(no binary with symbols around).
The system ran a "2.3.0-development" version from ~1 week ago or so. I think no
hammer related changes appeared since then.

The problem seems to have happened during a "hammer prune", as started by the
nightly "hammer cleanup", as the daily run mail reported:

cleanup /backup/root/home - handle PFS #2 using /backup/pfs-snapshots/home
snapshots - run
prune - Segmentation fault (core dumped)

The security mail reported that PFS to be read-only, but it's not critical (it's
a pfs-slave to /home, on a different hammer partition than / and /home)

Actions #1

Updated by dillon almost 16 years ago

:New submission from Patrick Georgi <>:
:
:This morning, my server was unresponsive (no network traffic, no console
:response beyond switching terminals and entering ddb. Regular keypresses,
:eg. at the login prompt weren't registered).
:The last messages in the log were
:
:Feb 25 03:05:23 t-stueck kernel: HAMMER: Critical error
:inode=3D24949148432 while syncing inode
:Feb 25 03:05:23 t-stueck kernel: HAMMER: Forcing read-only mode
:Feb 25 03:05:29 t-stueck kernel: pid 532 (hammer), uid 0: exited on signal =
:11
:(core dumped)
:
:The HAMMER lines were also on the console.
:
:I don't have a crash dump, and the hammer coredump isn't very useful either
:(no binary with symbols around).
:The system ran a "2.3.0-development" version from ~1 week ago or so. I thin=
:k no
:hammer related changes appeared since then.
:
:The problem seems to have happened during a "hammer prune", as started by t=
:he
:nightly "hammer cleanup", as the daily run mail reported:
:
:cleanup /backup/root/home - handle PFS #2 using /backup/pfs-snapshots/ho=
:me
: snapshots - run
: prune - Segmentation fault (core dumped)
:
:The security mail reported that PFS to be read-only, but it's not critical =
:(it's
:a pfs-slave to /home, on a different hammer partition than / and /home)
:
:----------
:messages: 6248
:nosy: pgeorgi
:priority: bug
:status: unread
:title: system hang due to hammer problem?

I think the seg-fault in the utility is a different issue, and not
the cause. A critical error can only occur if either the I/O subsystem
reports an error or the filesystem fills up unexpectedly.
If your HAMMER filesystem is nowhere near full then it was probably
an I/O error. We can't really tell without more of the console history
and/or a core dump.
We'll keep an eye for other related bug reports.
What kind of storage system is connected to the machine?  Internal
hard drive? SATA or PATA? You also might as well post the dmesg
output (post reboot), just so the information is recorded in the
bug report.
-Matt
Matthew Dillon
&lt;&gt;
Actions #2

Updated by pgeorgi almost 16 years ago

Two internal PATA disks in a VIA Epia C3 system.

atapci0: <VIA 8235 UDMA133 controller> port
0xbc00-0xbc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
ata0: <ATA channel 0> on atapci0
ad0: 381554MB <SAMSUNG HD400LD WQ100-15> at ata0-master UDMA100
ata1: <ATA channel 1> on atapci0
ad2: 381554MB <Seagate ST3400620A 3.AAF> at ata1-master UDMA100

Actions #3

Updated by jdc almost 16 years ago

Can you get SMART statistics for these using smartmontools? smartctl -a
/dev/ad0 and smartctl -a /dev/ad2 would be sufficient.

Actions #4

Updated by corecode over 15 years ago

could you confirm the cause of this issue?

Actions #5

Updated by tuxillo almost 10 years ago

  • Description updated (diff)
  • Category set to VFS subsystem
  • Status changed from New to Closed
  • Assignee deleted (0)
  • Target version set to 4.2

Hi,

Old bug where we can't verify the issue. This could be caused by faulty disk drives (but not necessarily).
Closing it but feel free to reopen/open a new one if you can reproduce it.

Cheers,
Antonio Huete

Actions

Also available in: Atom PDF