Bug #2249

deadlock under high i/o load (e.g. hammer reblock)

Added by rumcic almost 3 years ago. Updated over 2 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Almost latest master (last commit from upstream should be
0e84c0db0d0bcd6d681bc30fd16e96c26bb92db2).

Whenever nightly cron runs and a few minutes after hammer cleanup starts doing
it's job, the machine becomes mostly unresponsive (most processes and most ssh
connections stop responding, but strangely pf still continues doing it's job).

Fortunately debugging through serial console still worked, so the dump is
available at leaf:~rumko/crash/deadlock/*.0 .
--
Please do not CC me, since I already receive everything from these MLs.

Regards,
Rumko

History

#1 Updated by tuxillo almost 3 years ago

Hi Rumko,

I see your kern.0 is 7.5K, bad upload or bad dump?

-rw-r--r-- 1 rumko wheel 231K Nov 28 13:51 core.txt.0
-rw------- 1 rumko wheel 465B Nov 28 13:50 info.0
-rw-r--r-- 1 rumko wheel 7.5K Nov 28 13:50 kern.0
-rw------- 1 rumko wheel 407M Nov 28 13:51 vmcore.0

Thanks,
Antonio Huete

#2 Updated by rumcic almost 3 years ago

Antonio M. Huete Jimenez via Redmine wrote:
> Issue #2249 has been updated by Antonio M. Huete Jimenez.
>
>
> Hi Rumko,
>
> I see your kern.0 is 7.5K, bad upload or bad dump?

bad dump I'd guess, copied over the kernel from /boot and seems to be ok now

> -rw-r--r-- 1 rumko wheel 231K Nov 28 13:51 core.txt.0
> -rw------- 1 rumko wheel 465B Nov 28 13:50 info.0
> -rw-r--r-- 1 rumko wheel 7.5K Nov 28 13:50 kern.0
> -rw------- 1 rumko wheel 407M Nov 28 13:51 vmcore.0
>
> Thanks,
> Antonio Huete
<snip>
--
Please do not CC me, since I already receive everything from these MLs.

Regards,
Rumko

#3 Updated by tuxillo almost 3 years ago

Rumko,

It's indeed a deadlock. I'm experiencing that during 'hammer cleanup' also but on x86_64. Explanation from Matt:

21:21 <@dillon> the pageout daemon deadlock is because the hammer backend locks hammer inodes and the pageout
daemon frontend can only detect locked vnodes
21:21 <@dillon> so the pageout daemon will happily lock a vnode and then issue the pageout request to the hammer
backend and cause the hammer backend to get stuck on the inode (waiting for new memory)

It doesn't seem to have an easy solution.

Cheers,
Antonio Huete

#4 Updated by tuxillo almost 3 years ago

  • Status changed from New to In Progress

Rumko, Jan,

Matt pushed a fix for this, I'm about to try it myself in my main box.

Commit is 55b50bd522537a7b4e0810aa4cab05ad355d1381

Cheers,
Antonio Huete

#5 Updated by rumcic over 2 years ago

  • Status changed from In Progress to Resolved

The fix seems to do it's job, cannot get it to deadlock again

#6 Updated by alexh over 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF