Bug #2249
closeddeadlock under high i/o load (e.g. hammer reblock)
0%
Description
Almost latest master (last commit from upstream should be
0e84c0db0d0bcd6d681bc30fd16e96c26bb92db2).
Whenever nightly cron runs and a few minutes after hammer cleanup starts doing
it's job, the machine becomes mostly unresponsive (most processes and most ssh
connections stop responding, but strangely pf still continues doing it's job).
Fortunately debugging through serial console still worked, so the dump is
available at leaf:~rumko/crash/deadlock/*.0 .
--
Please do not CC me, since I already receive everything from these MLs.
Regards,
Rumko
Updated by tuxillo about 13 years ago
Hi Rumko,
I see your kern.0 is 7.5K, bad upload or bad dump?
rw-r--r- 1 rumko wheel 231K Nov 28 13:51 core.txt.0rw------ 1 rumko wheel 465B Nov 28 13:50 info.0rw-r--r- 1 rumko wheel 7.5K Nov 28 13:50 kern.0rw------ 1 rumko wheel 407M Nov 28 13:51 vmcore.0
Thanks,
Antonio Huete
Updated by rumcic about 13 years ago
Antonio M. Huete Jimenez via Redmine wrote:
Issue #2249 has been updated by Antonio M. Huete Jimenez.
Hi Rumko,
I see your kern.0 is 7.5K, bad upload or bad dump?
bad dump I'd guess, copied over the kernel from /boot and seems to be ok now
rw-r--r-1 rumko wheel 231K Nov 28 13:51 core.txt.0rw------1 rumko wheel 465B Nov 28 13:50 info.0rw-r--r-1 rumko wheel 7.5K Nov 28 13:50 kern.0rw------1 rumko wheel 407M Nov 28 13:51 vmcore.0Thanks,
Antonio Huete
<snip>
--
Please do not CC me, since I already receive everything from these MLs.
Regards,
Rumko
Updated by tuxillo about 13 years ago
Rumko,
It's indeed a deadlock. I'm experiencing that during 'hammer cleanup' also but on x86_64. Explanation from Matt:
21:21 <@dillon> the pageout daemon deadlock is because the hammer backend locks hammer inodes and the pageout
daemon frontend can only detect locked vnodes
21:21 <@dillon> so the pageout daemon will happily lock a vnode and then issue the pageout request to the hammer
backend and cause the hammer backend to get stuck on the inode (waiting for new memory)
It doesn't seem to have an easy solution.
Cheers,
Antonio Huete
Updated by tuxillo about 13 years ago
- Status changed from New to In Progress
Rumko, Jan,
Matt pushed a fix for this, I'm about to try it myself in my main box.
Commit is 55b50bd522537a7b4e0810aa4cab05ad355d1381
Cheers,
Antonio Huete
Updated by rumcic almost 13 years ago
- Status changed from In Progress to Resolved
The fix seems to do it's job, cannot get it to deadlock again