Bug #884

Performance/memory problems under filesystem IO load

Added by hasso almost 7 years ago. Updated 7 months ago.

Status:In ProgressStart date:
Priority:HighDue date:
Assignee:-% Done:

0%

Category:VM subsystem
Target version:3.8.0

Description

During testing drive with dd I noticed that there are serious performance
problems. Programs which need disk access, block for 10 and more seconds.
Sometimes they don't continue the work until dd is finished. Raw disk access
(ie not writing to file, but directly to the disk) is reported to be OK (I
can't test it myself).

All tests are done with this command:
dd if=/dev/zero of=./file bs=4096k count=1000

Syncing after each dd helps to reproduce it more reliably (cache?).

There is one more strange thing in running these tests. I looked at memory
stats in top before and after running dd.

Before:
Mem: 42M Active, 40M Inact, 95M Wired, 304K Cache, 53M Buf, 795M Free
After:
Mem: 70M Active, 679M Inact, 175M Wired, 47M Cache, 109M Buf, 1752K Free

And as a side effect - I can't get my network interfaces up any more after
running dd - "em0: Could not setup receive strucutres".

History

#1 Updated by nthery almost 7 years ago

[...]
> There is one more strange thing in running these tests. I looked at memory
> stats in top before and after running dd.
>
> Before:
> Mem: 42M Active, 40M Inact, 95M Wired, 304K Cache, 53M Buf, 795M Free
> After:
> Mem: 70M Active, 679M Inact, 175M Wired, 47M Cache, 109M Buf, 1752K Free

FWIW, I observe similar figures. I also noticed that deleting ./file
and waiting a bit restores memory to the "before" state.

The size increase in the wired pool can be reproduced more simply with:

sysctl vm.stats.vm.v_wire_count # A
dd if=/dev/zero of=./file bs=4096k count=1
sysctl vm.stats.vm.v_wire_count # B
rm ./file
sysctl vm.stats.vm.v_wire_count # C

A == C && B == A + 1

I traced this with gdb. The additional wired page is part of a struct
buf (b_xio) instance tied to the ./file vnode. I reckon this vnode
stays cached (namecache?) when the dd process ends and deleting ./file
forces destruction of the vnode.

AFAIU wired pages can not be reclaimed by the pager when memory is
low. So is it normal to keep b_xio pages wired when they are "just"
cached in a vnode (i.e. no ongoing r/w operation)?

#2 Updated by nthery almost 7 years ago

Oops. I've just noticed that block size is 4096k in the original post
and not 4096. I did all my experiments with the latter (there is a
paste bug in my previous email).

#3 Updated by corecode almost 7 years ago

I can confirm that creating a 4MB file consumes 1026 wired pages.

So either this is a) insane b) a bug or c) a counting issue, where the
buffer cache gets placed with the wired pages.

I didn't yet look at the code, but does this mean that additionally to
the buffer cache data, this b_xio is kept as well, keeping the *same* data?

cheers
simon

#4 Updated by dillon almost 7 years ago

:I can confirm that creating a 4MB file consumes 1026 wired pages.
:
:So either this is a) insane b) a bug or c) a counting issue, where the
:buffer cache gets placed with the wired pages.

They're just wired because they're in the buffer cache. The real issue
here is buffer cache management, that's all.

:>> I traced this with gdb. The additional wired page is part of a struct=
:
:>> buf (b_xio) instance tied to the ./file vnode. I reckon this vnode
:>> stays cached (namecache?) when the dd process ends and deleting ./file=
:
:>> forces destruction of the vnode.
:
:I didn't yet look at the code, but does this mean that additionally to
:the buffer cache data, this b_xio is kept as well, keeping the *same* dat=
:a?
:
:cheers
: simon

No, the b_xio is part of the struct buf. The pages are wired only once.

This is really just a buffer cache management issue. There's no reason
the pages have to remain wired that long... the related buffer can
certainly be destroyed after having been flushed to disk under heavy-load
situations while still leaving the backing VM pages intact in the VM
page cache. I would focus any fixes in that area.

-Matt
Matthew Dillon
<>

#5 Updated by tuxillo 7 months ago

  • Description updated (diff)
  • Category set to VM subsystem
  • Status changed from New to In Progress
  • Assignee deleted (0)
  • Target version set to 3.8.0

Hi,

I've done the same test under a vkernel:

# sysctl vm.stats.vm.v_wire_count
vm.stats.vm.v_wire_count: 11486
# dd if=/dev/zero of=./file bs=4m count=1
1+0 records in
1+0 records out
4194304 bytes transferred in 0.011742 secs (357201747 bytes/sec)
# sysctl vm.stats.vm.v_wire_count
vm.stats.vm.v_wire_count: 11675
# rm file
# sysctl vm.stats.vm.v_wire_count
vm.stats.vm.v_wire_count: 10647

And the same test on real hardware:

antonioh@nas:~$ sysctl vm.stats.vm.v_wire_count
vm.stats.vm.v_wire_count: 379492
antonioh@nas:~$ dd if=/dev/zero of=./file bs=4m count=1
1+0 records in
1+0 records out
4194304 bytes transferred in 0.035698 secs (117494297 bytes/sec)
antonioh@nas:~$ sysctl vm.stats.vm.v_wire_count
vm.stats.vm.v_wire_count: 379500
antonioh@nas:~$ rm file
antonioh@nas:~$ sysctl vm.stats.vm.v_wire_count
vm.stats.vm.v_wire_count: 378476

I don't see the high usage corecode showed in his test.

Matt, there was a ton of work in the VM subsystem, is it possible that this is not the case anymore?

Best regards,
Antonio Huete

#6 Updated by alexh 7 months ago

On 2014-02-18 14:47, wrote:
> I don't see the high usage corecode showed in his test.
>
> Matt, there was a ton of work in the VM subsystem, is it possible that
> this is not the case anymore?

Well, we most definitely have quite bad problems under heavy I/O load.
Interactive performance drops significantly. Getting BFQ or similar
dsched policy up to scratch would go a long way here.

Cheers,
Alex

Also available in: Atom PDF