Bug #1298: system hang due to hammer problem? - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #1298

closed

system hang due to hammer problem?

Added by pgeorgi about 17 years ago. Updated about 11 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

VFS subsystem

Target version:

4.2

Start date:

Due date:

% Done:

Estimated time:

Description

This morning, my server was unresponsive (no network traffic, no console
response beyond switching terminals and entering ddb. Regular keypresses,
eg. at the login prompt weren't registered).
The last messages in the log were

Feb 25 03:05:23 t-stueck kernel: HAMMER: Critical error
inode=24949148432 while syncing inode
Feb 25 03:05:23 t-stueck kernel: HAMMER: Forcing read-only mode
Feb 25 03:05:29 t-stueck kernel: pid 532 (hammer), uid 0: exited on signal 11
(core dumped)

The HAMMER lines were also on the console.

I don't have a crash dump, and the hammer coredump isn't very useful either
(no binary with symbols around).
The system ran a "2.3.0-development" version from ~1 week ago or so. I think no
hammer related changes appeared since then.

The problem seems to have happened during a "hammer prune", as started by the
nightly "hammer cleanup", as the daily run mail reported:

cleanup /backup/root/home - handle PFS #2 using /backup/pfs-snapshots/home
snapshots - run
prune - Segmentation fault (core dumped)

The security mail reported that PFS to be read-only, but it's not critical (it's
a pfs-slave to /home, on a different hammer partition than / and /home)

Actions

Copy link

Updated by dillon about 17 years ago

:New submission from Patrick Georgi <patrick@georgi-clan.de>:
:
:This morning, my server was unresponsive (no network traffic, no console
:response beyond switching terminals and entering ddb. Regular keypresses,
:eg. at the login prompt weren't registered).
:The last messages in the log were
:
:Feb 25 03:05:23 t-stueck kernel: HAMMER: Critical error
:inode=3D24949148432 while syncing inode
:Feb 25 03:05:23 t-stueck kernel: HAMMER: Forcing read-only mode
:Feb 25 03:05:29 t-stueck kernel: pid 532 (hammer), uid 0: exited on signal =
:11
:(core dumped)
:
:The HAMMER lines were also on the console.
:
:I don't have a crash dump, and the hammer coredump isn't very useful either
:(no binary with symbols around).
:The system ran a "2.3.0-development" version from ~1 week ago or so. I thin=
:k no
:hammer related changes appeared since then.
:
:The problem seems to have happened during a "hammer prune", as started by t=
:he
:nightly "hammer cleanup", as the daily run mail reported:
:
:cleanup /backup/root/home - handle PFS #2 using /backup/pfs-snapshots/ho=
:me
: snapshots - run
: prune - Segmentation fault (core dumped)
:
:The security mail reported that PFS to be read-only, but it's not critical =
:(it's
:a pfs-slave to /home, on a different hammer partition than / and /home)
:
:----------
:messages: 6248
:nosy: pgeorgi
:priority: bug
:status: unread
:title: system hang due to hammer problem?

I think the seg-fault in the utility is a different issue, and not
   the cause.  A critical error can only occur if either the I/O subsystem
   reports an error or the filesystem fills up unexpectedly.

If your HAMMER filesystem is nowhere near full then it was probably
   an I/O error.  We can't really tell without more of the console history
   and/or a core dump.

We'll keep an eye for other related bug reports.

What kind of storage system is connected to the machine?  Internal
   hard drive?  SATA or PATA?  You also might as well post the dmesg
   output (post reboot), just so the information is recorded in the
   bug report.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by pgeorgi about 17 years ago

Two internal PATA disks in a VIA Epia C3 system.

atapci0: <VIA 8235 UDMA133 controller> port
0xbc00-0xbc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
ata0: <ATA channel 0> on atapci0
ad0: 381554MB <SAMSUNG HD400LD WQ100-15> at ata0-master UDMA100
ata1: <ATA channel 1> on atapci0
ad2: 381554MB <Seagate ST3400620A 3.AAF> at ata1-master UDMA100

Actions

Copy link

Updated by jdc about 17 years ago

Can you get SMART statistics for these using smartmontools? smartctl -a
/dev/ad0 and smartctl -a /dev/ad2 would be sufficient.

Actions

Copy link

Updated by corecode almost 17 years ago

could you confirm the cause of this issue?

Actions

Copy link

Updated by tuxillo about 11 years ago

Description updated (diff)
Category set to VFS subsystem
Status changed from New to Closed
Assignee deleted (0)
Target version set to 4.2

Hi,

Old bug where we can't verify the issue. This could be caused by faulty disk drives (but not necessarily).
Closing it but feel free to reopen/open a new one if you can reproduce it.

Cheers,
Antonio Huete

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #1298

system hang due to hammer problem?

Updated by dillon about 17 years ago

Updated by pgeorgi about 17 years ago

Updated by jdc about 17 years ago

Updated by corecode almost 17 years ago

Updated by tuxillo about 11 years ago