Project

General

Profile

Actions

Bug #2986

closed

kernel panic when running fio (modified fio)

Added by tkusumi about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
03/23/2017
Due date:
% Done:

0%

Estimated time:

Description

After running for 1 min or so, fio threads either get killed due to oom, or vm subsystem panics.
Both can happen and neither is rare on my environment.
I'm using master.
https://leaf.dragonflybsd.org/~tkusumi/diff/fio_bs_panic.PNG

1. git clone git://git.kernel.dk/fio
2. Make following change


  1. git diff
    diff --git a/filesetup.c b/filesetup.c
    index bcf95bd5..4579a181 100644
    --- a/filesetup.c
    +++ b/filesetup.c
    @ -243,6 +243,7 @ static int pre_read_file(struct thread_data *td, struct fio_file *f)
bs = td->o.max_bs[DDIR_READ];
b = malloc(bs);
+ b[0] = 0xff;
memset(b, 0, bs);
if (lseek(f->fd, f->file_offset, SEEK_SET) < 0) {
-----------------

3. Run below. --bsrange= may need to be changed depending on ram/swap size. In this example 100000 MiB is the size of malloc in above diff. # ./configure # gmake -j8 # rm ./xxxxx.* if exists # ./fio --name=xxxxx --ioengine=sync --rw=read --bs=32k --size=10m --numjobs=10 --nrfiles=10 --unlink=1 --bsrange=1k:100000m --pre_read=1

Actions #1

Updated by tkusumi about 7 years ago

One correction.

In this example 100000 MiB is the size of malloc in above diff.

The variable for it was unsigned int, so malloc size is actually 4GiB.

Actions #2

Updated by tkusumi about 7 years ago

@dillon, thanks for this commit.
https://gitweb.dragonflybsd.org/dragonfly.git/commit/e16c650d787872d60f6ca3d20a7f1bc1e0c19261

It did fix the panic, but now threads no longer get killed by oom, nor am I able to kill them.
It continues showing this shortage messages, and requires reboot in the end.
https://leaf.dragonflybsd.org/~tkusumi/diff/fio_bs_swap.PNG

Actions #3

Updated by dillon about 7 years ago

The fio program catches ^C. You should be able to kill it by ^Z'ing it to stop it and then doing a 'killall -9 fio'. Not sure why the low memory code couldn't kill it, though it could be that it is and the fio parent is re-forking or something like that. Either way the program with the arguments you specify appears to try to manipulate around 32GB of memory, so if you don't have the resources to support that the machine is not going to be happy. You can reduce the impact on the rest of the machine by setting a memoryuse resource limit (x 10 jobs). For example, from csh/tcsh: limit memoryuse 100m, and then run the fio job.

-Matt

Actions #4

Updated by tkusumi about 7 years ago

yeah, I don't care if the arguments require more than physical ram or anything, because that's what I was trying to do, but my expectation was threads get killed by oom-killer (or whatever called in dfly's vm) like it got killed before this fix.

Actions #5

Updated by tkusumi about 7 years ago

  • Status changed from New to Closed

The kernel panic is fixed.
Though there's still a problem with it (as mentioned), it's a different topic from this panic, and also not possible to fix it right at the moment (according to dillon@).

Actions

Also available in: Atom PDF