Bug #2986
closedkernel panic when running fio (modified fio)
0%
Description
After running for 1 min or so, fio threads either get killed due to oom, or vm subsystem panics.
Both can happen and neither is rare on my environment.
I'm using master.
https://leaf.dragonflybsd.org/~tkusumi/diff/fio_bs_panic.PNG
1. git clone git://git.kernel.dk/fio
2. Make following change
- git diff
diff --git a/filesetup.c b/filesetup.c
index bcf95bd5..4579a181 100644
--- a/filesetup.c
+++ b/filesetup.c@ -243,6 +243,7
@ static int pre_read_file(struct thread_data *td, struct fio_file *f)
bs = td->o.max_bs[DDIR_READ];
b = malloc(bs);
+ b[0] = 0xff;
memset(b, 0, bs);
if (lseek(f->fd, f->file_offset, SEEK_SET) < 0) {
-----------------
3. Run below. --bsrange= may need to be changed depending on ram/swap size. In this example 100000 MiB is the size of malloc in above diff. # ./configure # gmake -j8 # rm ./xxxxx.* if exists # ./fio --name=xxxxx --ioengine=sync --rw=read --bs=32k --size=10m --numjobs=10 --nrfiles=10 --unlink=1 --bsrange=1k:100000m --pre_read=1
Updated by tkusumi almost 8 years ago
One correction.
In this example 100000 MiB is the size of malloc in above diff.
The variable for it was unsigned int, so malloc size is actually 4GiB.
Updated by tkusumi almost 8 years ago
@dillon, thanks for this commit.
https://gitweb.dragonflybsd.org/dragonfly.git/commit/e16c650d787872d60f6ca3d20a7f1bc1e0c19261
It did fix the panic, but now threads no longer get killed by oom, nor am I able to kill them.
It continues showing this shortage messages, and requires reboot in the end.
https://leaf.dragonflybsd.org/~tkusumi/diff/fio_bs_swap.PNG
Updated by dillon almost 8 years ago
The fio program catches ^C. You should be able to kill it by ^Z'ing it to stop it and then doing a 'killall -9 fio'. Not sure why the low memory code couldn't kill it, though it could be that it is and the fio parent is re-forking or something like that. Either way the program with the arguments you specify appears to try to manipulate around 32GB of memory, so if you don't have the resources to support that the machine is not going to be happy. You can reduce the impact on the rest of the machine by setting a memoryuse resource limit (x 10 jobs). For example, from csh/tcsh: limit memoryuse 100m, and then run the fio job.
-Matt
Updated by tkusumi almost 8 years ago
yeah, I don't care if the arguments require more than physical ram or anything, because that's what I was trying to do, but my expectation was threads get killed by oom-killer (or whatever called in dfly's vm) like it got killed before this fix.
Updated by tkusumi almost 8 years ago
- Status changed from New to Closed
The kernel panic is fixed.
Though there's still a problem with it (as mentioned), it's a different topic from this panic, and also not possible to fix it right at the moment (according to dillon@).