Bug #1096: Hammer: processes blocked on a file - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #1096

closed

Hammer: processes blocked on a file

Added by ftigeot over 17 years ago. Updated almost 16 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

This is on a DragonFly 2.0 machine with a freshly newfs'd /home hammer
filesystem.
So far, I have been able to reproduce this bug twice.

korganizer, the kde agenda tool is unable to close properly since I
upgraded to DragonFly-2.0. The process seems to stay in some kind of
zombie state.
When trying to reboot the system, the following messages are shown on the
console:

init: some processes would not die; ps axl advised

...

syncing disks... 1
done
umount of filesystem mounted from subaru4:/distfiles failed(70)
[diagnostic] cache_lock: blocked on 0xe16da678 "korganizerrcbvUSpw.new"

The machine then stays in this state forever; I have to use the reset switch.

Actions

Copy link

Updated by mneumann over 17 years ago

Yeah, I got exactly the same with firefox3. I tried to kill it with
"kill -9", but that wasn't possible. Then rebooting showed the same
messages as above.

Regards,

Michael

Actions

Copy link

Updated by dillon over 17 years ago

:>
:>
:> The machine then stays in this state forever; I have to use the reset switch.
:
:Yeah, I got exactly the same with firefox3. I tried to kill it with
:"kill -9", but that wasn't possible. Then rebooting showed the same
:messages as above.
:
:Regards,
:
: Michael

Michael, if you can recreate the unkillable firefox3 please force a
    kernel dump, which you should be able to do by dropping into the
    debugger and typing 'panic'.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by ftigeot over 17 years ago

I have a dump with an unkillable korganizer (pid 922).
Every time the problem occured, it was with a kde application; most of the
time korganizer.

Where can I put it ? vmcore.0 is 3.0 GB uncompressed.

Actions

Copy link

Updated by dillon over 17 years ago

:I have a dump with an unkillable korganizer (pid 922).
:Every time the problem occured, it was with a kde application; most of the
:time korganizer.
:
:Where can I put it ? vmcore.0 is 3.0 GB uncompressed.
:
:--
:Francois Tigeot

I'll send you private email.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by mneumann over 17 years ago

It occured to me once again. I've uploaded vmcore and kernel to my
crash/ directory.

It seems that the vnode is (dead-) locked and trying to access it (e.g.
typing a "ls" in that directory) blocks that process as well.

Regards,

Michael

Actions

Copy link

Updated by dillon over 17 years ago

:It occured to me once again. I've uploaded vmcore and kernel to my
:crash/ directory.
:
:It seems that the vnode is (dead-) locked and trying to access it (e.g.
:typing a "ls" in that directory) blocks that process as well.
:
:Regards,
:
: Michael

I've examined both your crash dump and Francois's crash dump.  Both
    reveal the same bug.

What is happening is that the processes are stuck waiting for a
    HAMMER inode which has a dependancy to flush, but HAMMER has lost
    track of the dependancy so the process winds up waiting forever.

(kgdb) back
#0 0xc02cce63 in lwkt_switch ()
#1 0xc02d0d13 in tsleep (ident=0xe1552814, flags=0, wmesg=0xc05432fc "hmrwin", timo=0)
#2 0xc042eeba in hammer_wait_inode (ip=0xe15527d0)
#3 0xc0440e89 in hammer_vop_fsync (ap=0xe112cc88)
#4 0xc0316162 in vop_fsync (ops=0xc5cc6f30, vp=0xe15a27a8, waitfor=1)
#5 0xc03144be in sys_fsync (uap=0xe112ccf0)
#6 0xc04d1e62 in syscall2 (frame=0xe112cd40)

(kgdb) print ip->flush_state
$1 = HAMMER_FST_SETUP <--- means there is stuff on target_list
(kgdb) print ip->target_list
$2 = {
tqh_first = 0x0, <--- except the target_list is empty!
tqh_last = 0xe15527f0
}
(kgdb)

I will commit a fix today to both HEAD and 2.0.  It will need some
    continuing testing since it is fairly hard to reproduce, to make sure
    I actually fixed it the right place.

Thank you both!

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by ftigeot over 17 years ago

[...]

For some reason, I have no trouble on one of my machines. I haven't been
able to pinpoint an obvious cause yet, but it now takes less than one hour
after a reboot to show.
I just need to run some random applications: web browser, email, etc...
and korganizer then freezes almost every time...

Actions

Copy link

Updated by dillon over 17 years ago

:For some reason, I have no trouble on one of my machines. I haven't been
:able to pinpoint an obvious cause yet, but it now takes less than one hour
:after a reboot to show.
:I just need to run some random applications: web browser, email, etc...
:and korganizer then freezes almost every time...
:
:--
:Francois Tigeot

Please try this kernel patch, Francois:

fetch http://apollo.backplane.com/DFlyMisc/hammer02.patch

I haven't committed it yet.  I think it will fix the problem but if
    you (and Mike) could test it first I'll know whether I have to keep
    looking or not.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by ftigeot over 17 years ago

Thanks for the patch. My troubled machine has been running with it for
more than 8 hours now and the problem has yet to show itself.

Habitually, I would have seen the hang five or six times in this period.

Actions

Copy link

#10

Updated by dillon over 17 years ago

:Thanks for the patch. My troubled machine has been running with it for
:more than 8 hours now and the problem has yet to show itself.
:
:Habitually, I would have seen the hang five or six times in this period.
:
:--
:Francois Tigeot

Ok, that's good.

Side note to everyone running HAMMER:  I will be MFC'ing these fixes
    to 2.0 this weekend.  They are currently only in HEAD.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

#11

Updated by bastyaelvtars over 17 years ago

I just ran into this issue. Your patch fixes it.

Actions

Copy link

#12

Updated by bastyaelvtars over 17 years ago

On Thu, 31 Jul 2008 19:58:10 +0200
Gergo Szakal <bastyaelvtars@gmail.com> wrote:

I spoke to soon. It comes again and is reproducible. Steps:

1) Install monotone from pkgsrc.
2) Execute the following on a HAMMER filesystem:
mtn db init --db=somedbname.mtn
(This just creates a file.)
3) The mtn process freezes at the 'hmrwin' state.

Actions

Copy link

#13

Updated by dillon over 17 years ago

:On Thu, 31 Jul 2008 19:58:10 +0200
:Gergo Szakal <bastyaelvtars@gmail.com> wrote:
:
:> I just ran into this issue. Your patch fixes it.
:>
:
:I spoke to soon. It comes again and is reproducible. Steps:
:
:1) Install monotone from pkgsrc.
:2) Execute the following on a HAMMER filesystem:
: mtn db init --db=somedbname.mtn
: (This just creates a file.)
:3) The mtn process freezes at the 'hmrwin' state.
:
:--
:Gergo Szakal MD <bastyaelvtars@gmail.com>
:University Of Szeged, HU

Ok, get a kernel core onto leaf and I'll look at it.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

#14

Updated by bastyaelvtars over 17 years ago

On Thu, 31 Jul 2008 12:18:34 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:

I still cannot make cores (That's subject to another bugreport. :-P)

Actions

Copy link

#15

Updated by mneumann over 17 years ago

Have you set dumpdev in rc.conf to your swap-partition? For example:

dumpdev="/dev/ad0s1b"

Regards,

Michael

Actions

Copy link

#16

Updated by dillon over 17 years ago

:
:> Ok, get a kernel core onto leaf and I'll look at it.
:
:I still cannot make cores (That's subject to another bugreport. :-P)
:
:--
:Gergo Szakal MD <bastyaelvtars@gmail.com>

Try putting this in your /etc/sysctl.conf:

kern.sync_on_panic=0

This stops the kernel from trying to sync the buffer cache on panic.
    Often when debugging VFS issues the kernel can lockup before the dump
    when it tries to sync the buffer cache.

The other usual suspects are:

/var/crash does not have enough space and

swap space is not big enough (must be >= amount of main memory).

kernel binary does not match kernel core. Usually this means the
/kernel file is not the kernel that was actually running.
-Matt
Matthew Dillon
<dillon@backplane.com>

Actions

Copy link

#17

Updated by aoiko about 17 years ago

Still waiting for a core... Can anyone else reproduce the bug?

Actions

Copy link

#18

Updated by alexh almost 16 years ago

Has anyone seen this happening recently? If not, I think it's safe to close this
:)

Cheers,
Alex Hornung

Actions

Copy link

#19

Updated by ftigeot almost 16 years ago

On Fri, Apr 02, 2010 at 10:50:29PM +0000, Alex Hornung (via DragonFly issue tracker) wrote:

Alex Hornung <ahornung@gmail.com> added the comment:

Has anyone seen this happening recently? If not, I think it's safe to close this
:)

I have not seen this since Matt patched DragonFly-2.0 back in 2008 :-)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #1096

Hammer: processes blocked on a file

Updated by mneumann over 17 years ago

Updated by dillon over 17 years ago

Updated by ftigeot over 17 years ago

Updated by dillon over 17 years ago

Updated by mneumann over 17 years ago

Updated by dillon over 17 years ago

Updated by ftigeot over 17 years ago

Updated by dillon over 17 years ago

Updated by ftigeot over 17 years ago

Updated by dillon over 17 years ago

Updated by bastyaelvtars over 17 years ago

Updated by bastyaelvtars over 17 years ago

Updated by dillon over 17 years ago

Updated by bastyaelvtars over 17 years ago

Updated by mneumann over 17 years ago

Updated by dillon over 17 years ago

Updated by aoiko about 17 years ago

Updated by alexh almost 16 years ago

Updated by ftigeot almost 16 years ago