Project

General

Profile

Actions

Bug #1096

closed

Hammer: processes blocked on a file

Added by ftigeot over 15 years ago. Updated about 14 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

This is on a DragonFly 2.0 machine with a freshly newfs'd /home hammer
filesystem.
So far, I have been able to reproduce this bug twice.

korganizer, the kde agenda tool is unable to close properly since I
upgraded to DragonFly-2.0. The process seems to stay in some kind of
zombie state.
When trying to reboot the system, the following messages are shown on the
console:

init: some processes would not die; ps axl advised

...

syncing disks... 1
done
umount of filesystem mounted from subaru4:/distfiles failed(70)
[diagnostic] cache_lock: blocked on 0xe16da678 "korganizerrcbvUSpw.new"

The machine then stays in this state forever; I have to use the reset switch.

Actions #1

Updated by mneumann over 15 years ago

Yeah, I got exactly the same with firefox3. I tried to kill it with
"kill -9", but that wasn't possible. Then rebooting showed the same
messages as above.

Regards,

Michael
Actions #2

Updated by dillon over 15 years ago

:>
:>
:> The machine then stays in this state forever; I have to use the reset switch.
:
:Yeah, I got exactly the same with firefox3. I tried to kill it with
:"kill -9", but that wasn't possible. Then rebooting showed the same
:messages as above.
:
:Regards,
:
: Michael

Michael, if you can recreate the unkillable firefox3 please force a
kernel dump, which you should be able to do by dropping into the
debugger and typing 'panic'.
-Matt
Matthew Dillon
<>
Actions #3

Updated by ftigeot over 15 years ago

I have a dump with an unkillable korganizer (pid 922).
Every time the problem occured, it was with a kde application; most of the
time korganizer.

Where can I put it ? vmcore.0 is 3.0 GB uncompressed.

Actions #4

Updated by dillon over 15 years ago

:I have a dump with an unkillable korganizer (pid 922).
:Every time the problem occured, it was with a kde application; most of the
:time korganizer.
:
:Where can I put it ? vmcore.0 is 3.0 GB uncompressed.
:
:--
:Francois Tigeot

I'll send you private email.
-Matt
Matthew Dillon
<>
Actions #5

Updated by mneumann over 15 years ago

It occured to me once again. I've uploaded vmcore and kernel to my
crash/ directory.

It seems that the vnode is (dead-) locked and trying to access it (e.g.
typing a "ls" in that directory) blocks that process as well.

Regards,

Michael
Actions #6

Updated by dillon over 15 years ago

:It occured to me once again. I've uploaded vmcore and kernel to my
:crash/ directory.
:
:It seems that the vnode is (dead-) locked and trying to access it (e.g.
:typing a "ls" in that directory) blocks that process as well.
:
:Regards,
:
: Michael

I've examined both your crash dump and Francois's crash dump.  Both
reveal the same bug.
What is happening is that the processes are stuck waiting for a
HAMMER inode which has a dependancy to flush, but HAMMER has lost
track of the dependancy so the process winds up waiting forever.

(kgdb) back
#0 0xc02cce63 in lwkt_switch ()
#1 0xc02d0d13 in tsleep (ident=0xe1552814, flags=0, wmesg=0xc05432fc "hmrwin", timo=0)
#2 0xc042eeba in hammer_wait_inode (ip=0xe15527d0)
#3 0xc0440e89 in hammer_vop_fsync (ap=0xe112cc88)
#4 0xc0316162 in vop_fsync (ops=0xc5cc6f30, vp=0xe15a27a8, waitfor=1)
#5 0xc03144be in sys_fsync (uap=0xe112ccf0)
#6 0xc04d1e62 in syscall2 (frame=0xe112cd40)

(kgdb) print ip->flush_state
$1 = HAMMER_FST_SETUP <--- means there is stuff on target_list
(kgdb) print ip->target_list
$2 = {
tqh_first = 0x0, <--- except the target_list is empty!
tqh_last = 0xe15527f0
}
(kgdb)

I will commit a fix today to both HEAD and 2.0.  It will need some
continuing testing since it is fairly hard to reproduce, to make sure
I actually fixed it the right place.
Thank you both!
-Matt
Matthew Dillon
&lt;&gt;
Actions #7

Updated by ftigeot over 15 years ago

[...]

For some reason, I have no trouble on one of my machines. I haven't been
able to pinpoint an obvious cause yet, but it now takes less than one hour
after a reboot to show.
I just need to run some random applications: web browser, email, etc...
and korganizer then freezes almost every time...

Actions #8

Updated by dillon over 15 years ago

:For some reason, I have no trouble on one of my machines. I haven't been
:able to pinpoint an obvious cause yet, but it now takes less than one hour
:after a reboot to show.
:I just need to run some random applications: web browser, email, etc...
:and korganizer then freezes almost every time...
:
:--
:Francois Tigeot

Please try this kernel patch, Francois:
fetch http://apollo.backplane.com/DFlyMisc/hammer02.patch
I haven't committed it yet.  I think it will fix the problem but if
you (and Mike) could test it first I'll know whether I have to keep
looking or not.
-Matt
Matthew Dillon
&lt;&gt;
Actions #9

Updated by ftigeot over 15 years ago

Thanks for the patch. My troubled machine has been running with it for
more than 8 hours now and the problem has yet to show itself.

Habitually, I would have seen the hang five or six times in this period.

Actions #10

Updated by dillon over 15 years ago

:Thanks for the patch. My troubled machine has been running with it for
:more than 8 hours now and the problem has yet to show itself.
:
:Habitually, I would have seen the hang five or six times in this period.
:
:--
:Francois Tigeot

Ok, that's good.
Side note to everyone running HAMMER:  I will be MFC'ing these fixes
to 2.0 this weekend. They are currently only in HEAD.
-Matt
Matthew Dillon
&lt;&gt;
Actions #11

Updated by bastyaelvtars over 15 years ago

I just ran into this issue. Your patch fixes it.

Actions #12

Updated by bastyaelvtars over 15 years ago

On Thu, 31 Jul 2008 19:58:10 +0200
Gergo Szakal <> wrote:

I spoke to soon. It comes again and is reproducible. Steps:

1) Install monotone from pkgsrc.
2) Execute the following on a HAMMER filesystem:
mtn db init --db=somedbname.mtn
(This just creates a file.)
3) The mtn process freezes at the 'hmrwin' state.

Actions #13

Updated by dillon over 15 years ago

:On Thu, 31 Jul 2008 19:58:10 +0200
:Gergo Szakal <> wrote:
:
:> I just ran into this issue. Your patch fixes it.
:>
:
:I spoke to soon. It comes again and is reproducible. Steps:
:
:1) Install monotone from pkgsrc.
:2) Execute the following on a HAMMER filesystem:
: mtn db init --db=somedbname.mtn
: (This just creates a file.)
:3) The mtn process freezes at the 'hmrwin' state.
:
:--
:Gergo Szakal MD <>
:University Of Szeged, HU

Ok, get a kernel core onto leaf and I'll look at it.
-Matt
Matthew Dillon
&lt;&gt;
Actions #14

Updated by bastyaelvtars over 15 years ago

On Thu, 31 Jul 2008 12:18:34 -0700 (PDT)
Matthew Dillon <> wrote:

I still cannot make cores (That's subject to another bugreport. :-P)

Actions #15

Updated by mneumann over 15 years ago

Have you set dumpdev in rc.conf to your swap-partition? For example:

dumpdev="/dev/ad0s1b"

Regards,

Michael
Actions #16

Updated by dillon over 15 years ago

:
:> Ok, get a kernel core onto leaf and I'll look at it.
:
:I still cannot make cores (That's subject to another bugreport. :-P)
:
:--
:Gergo Szakal MD <>

Try putting this in your /etc/sysctl.conf:
kern.sync_on_panic=0
This stops the kernel from trying to sync the buffer cache on panic.
Often when debugging VFS issues the kernel can lockup before the dump
when it tries to sync the buffer cache.
The other usual suspects are:
  • /var/crash does not have enough space and
  • swap space is not big enough (must be >= amount of main memory).
  • kernel binary does not match kernel core. Usually this means the
    /kernel file is not the kernel that was actually running.

    -Matt
    Matthew Dillon
    <>

Actions #17

Updated by aoiko over 15 years ago

Still waiting for a core... Can anyone else reproduce the bug?

Actions #18

Updated by alexh about 14 years ago

Has anyone seen this happening recently? If not, I think it's safe to close this
:)

Cheers,
Alex Hornung

Actions #19

Updated by ftigeot about 14 years ago

On Fri, Apr 02, 2010 at 10:50:29PM +0000, Alex Hornung (via DragonFly issue tracker) wrote:

Alex Hornung <> added the comment:

Has anyone seen this happening recently? If not, I think it's safe to close this
:)

I have not seen this since Matt patched DragonFly-2.0 back in 2008 :-)

Actions

Also available in: Atom PDF