Bug #1679

Machine unresponsive with cache_lock: blocked on... message

Added by ftigeot almost 5 years ago. Updated over 3 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

My backup machine has been rendered unresponsive.

It is a Core 2 Duo / 2GB system running DragonFly v2.5.1.766.g3da46-DEVELOPMENT
(from 2010-02-11)

I think it was after a rsync run. There were many rsync error messages
on the console followed by this kernel message:

[diagnostic] cache_lock : blocked on 0xe920a868 "spwd.db"

The machine did respond to ping, but nothing else:
- no nfs traffic
- no ssh response
- nothing when pressing keys on the console.

I managed to get a core dump via ctl-alt-esc; it has been put online
at http://www.wolfpond.org/crash.dfly/

History

#1 Updated by dillon almost 5 years ago

:
:My backup machine has been rendered unresponsive.
:
:It is a Core 2 Duo / 2GB system running DragonFly v2.5.1.766.g3da46-DEVELOPMENT
:(from 2010-02-11)

Definitely upgrade to the latest. I fixed a few NFS related deadlocks
in the namecache. The issues were related to the readdirplus change.

-Matt

#2 Updated by ftigeot almost 5 years ago

On Sat, Feb 20, 2010 at 09:01:41AM -0800, Matthew Dillon wrote:
> :
> :My backup machine has been rendered unresponsive.
> :
> :It is a Core 2 Duo / 2GB system running DragonFly v2.5.1.766.g3da46-DEVELOPMENT
> :(from 2010-02-11)
>
> Definitely upgrade to the latest. I fixed a few NFS related deadlocks
> in the namecache. The issues were related to the readdirplus change.

I just got a new crash.

The system was running DragonFly v2.5.1.902.g333f4-DEVELOPMENT from the 25th.

There were these messages on the console:

Warning: pmap_interlock 00010003
Warning: pmap_interlock 00010003

No cache_lock stuff this time.

The machine did respond to ping and to some limited keyboard action:

- I could switch virtual terminals via ALT-F2, -F3 etc...
- I could switch numlock/scroll lock/capslock/ keyboard leds on or off
- I could move back in console history with scroll lock / Page Up or Page Down

Apart from that, typing on the keyboard had no effect.

I could get a core dump via ctl-alt-esc; you can get it here:
http://www.wolfpond.org/crash.dfly/

#3 Updated by dillon almost 5 years ago

:I just got a new crash.
:
:The system was running DragonFly v2.5.1.902.g333f4-DEVELOPMENT from the 25th.
:
:There were these messages on the console:
:
:Warning: pmap_interlock 00010003
:Warning: pmap_interlock 00010003
:
:No cache_lock stuff this time.

It looks like a deadlock between the VM system and HAMMER due to
kmalloc() blocking during a low memory situation. Your system
was paging heavily at the time and was low on free memory.

I have committed a change to master which should fix this particular
issue.

-Matt
Matthew Dillon
<>

#4 Updated by ftigeot almost 5 years ago

On Thu, Mar 04, 2010 at 04:09:27PM -0800, Matthew Dillon wrote:
> :I just got a new crash.
> :
> :The system was running DragonFly v2.5.1.902.g333f4-DEVELOPMENT from the 25th.
> :
> :There were these messages on the console:
> :
> :Warning: pmap_interlock 00010003
> :Warning: pmap_interlock 00010003
> :
> :No cache_lock stuff this time.
>
> It looks like a deadlock between the VM system and HAMMER due to
> kmalloc() blocking during a low memory situation. Your system
> was paging heavily at the time and was low on free memory.

This particular machine has 2GB main memory.

I've noticed 2.5 has a tendency to swap more than 2.4; it seems heavy disk
activity (rsync) gets cached at the expense of running applications.

> I have committed a change to master which should fix this particular
> issue.

Thanks, I'm upgrading right now.

#5 Updated by ftigeot almost 5 years ago

On Thu, Mar 04, 2010 at 04:09:27PM -0800, Matthew Dillon wrote:
> :I just got a new crash.
> :
> :The system was running DragonFly v2.5.1.902.g333f4-DEVELOPMENT from the 25th.
> :
> :There were these messages on the console:
> :
> :Warning: pmap_interlock 00010003
> :Warning: pmap_interlock 00010003
> :
> :No cache_lock stuff this time.
>
> It looks like a deadlock between the VM system and HAMMER due to
> kmalloc() blocking during a low memory situation. Your system
> was paging heavily at the time and was low on free memory.
>
> I have committed a change to master which should fix this particular
> issue.

I got a new one.
Same symptoms, many pmap_interlock messages on the console.

System has been updated to
DragonFly v2.5.1.960.g7a6ce-DEVELOPMENT #9: Thu Mar 11 10:22:12 CET 2010

Kernel and crash dump are at the usual place:

http://www.wolfpond.org/crash.dfly/

#6 Updated by sjg almost 5 years ago

> This particular machine has 2GB main memory.
>
> I've noticed 2.5 has a tendency to swap more than 2.4; it seems heavy disk
> activity (rsync) gets cached at the expense of running applications.
>

rsync behaves a bit differently than a run of tar or cpdup, it walks
the whole directory structure poking files and generates a list and
then starts acting on the list. I would guess this is enough to land
the associated pages in the VM active queue whereas cpdup would not,
taking a more direct approach.

http://gitweb.dragonflybsd.org/dragonfly.git/commit/0e8bd897b2ebcf1a575536f3bfdd88fe2377cc27

AFAICT the decision logic on whether to place a page on the inactive
or active queue is centered around the vfs.vm_cycle_point sysctl, you
could try raising it.

#7 Updated by dillon almost 5 years ago

:> :Warning: pmap_interlock 00010003
:> :Warning: pmap_interlock 00010003
:> :
:> :No cache_lock stuff this time.
:>
:> It looks like a deadlock between the VM system and HAMMER due to
:> kmalloc() blocking during a low memory situation. Your system
:> was paging heavily at the time and was low on free memory.
:>
:> I have committed a change to master which should fix this particular
:> issue.
:
:I got a new one.
:Same symptoms, many pmap_interlock messages on the console.
:
:System has been updated to
:DragonFly v2.5.1.960.g7a6ce-DEVELOPMENT #9: Thu Mar 11 10:22:12 CET 2010
:...
:--
:Francois Tigeot

This one revealed a different and very interesting MP race which
caused shared<->exclusive deadlock.

What is happening is that two threads are competing for an exclusive
lock on a HAMMER structure in order to load a data buffer from disk.

This occurs in the middle of the B-Tree lookup code which relies on
shared locks. When I coded it up I assumed it would be safe to
acquire an exclusive lock because there was no data loaded yet, so
there would not have been other consumers.

But I was wrong. If two threads compete to load the data buffer
then one thread will win and start using that buffer and surrounding
buffers with shared locks while the other thread will be holding
shared locks on other buffers while attempting to acquire an exclusive
lock on the new buffer (which the first thread had already loaded the
data for). The result is a deadlock.

The solution is to use an interlock instead of an exclusive lock to
test whether the structure needs data loaded from the drive or not.
This way if there is a race the second thread will not deadlock trying
to get the interlock when other threads hold the structure shared.

It is going to take me a day or two to test the fix for this.

-Matt
Matthew Dillon
<>

#8 Updated by ftigeot almost 5 years ago

On Sat, Mar 13, 2010 at 06:33:25PM -0800, Matthew Dillon wrote:
>
> :I got a new one.
> :Same symptoms, many pmap_interlock messages on the console.
> :
> :System has been updated to
> :DragonFly v2.5.1.960.g7a6ce-DEVELOPMENT #9: Thu Mar 11 10:22:12 CET 2010
> :...
> :--
> :Francois Tigeot
>
> This one revealed a different and very interesting MP race which
> caused shared<->exclusive deadlock.

Yet an other crash.

Symptoms were the sames as in the previous cases.

System has been updated to
DragonFly v2.5.1.982.g5dece-DEVELOPMENT from March 15.

Kernel and crash dump are available here:

http://www.wolfpond.org/crash.dfly/

#9 Updated by dillon almost 5 years ago

:Yet an other crash.
:
:Symptoms were the sames as in the previous cases.
:
:System has been updated to
:DragonFly v2.5.1.982.g5dece-DEVELOPMENT from March 15.
:
:Kernel and crash dump are available here:
:
:http://www.wolfpond.org/crash.dfly/
:
:--
:Francois Tigeot

The HAMMER rebalancing code again. It is still eating too much memory
in a critical path. I'll have to think about this a bit.

-Matt
Matthew Dillon
<>

#10 Updated by ftigeot almost 5 years ago

On Sat, Mar 20, 2010 at 11:34:19AM -0700, Matthew Dillon wrote:
>
> :Yet an other crash.
> :
> :Symptoms were the sames as in the previous cases.
> :
> :System has been updated to
> :DragonFly v2.5.1.982.g5dece-DEVELOPMENT from March 15.
> :
> :Kernel and crash dump are available here:
> :
> :http://www.wolfpond.org/crash.dfly/
> :
> :--
> :Francois Tigeot
>
> The HAMMER rebalancing code again. It is still eating too much memory
> in a critical path. I'll have to think about this a bit.

This machine was really fine running the same workload with 2.4.

With 2.5, simple hammer cleanup or rsync commands are enough to make
PCs with 2GB or 3GB memory swap.

#11 Updated by dillon almost 5 years ago

:This machine was really fine running the same workload with 2.4.
:
:With 2.5, simple hammer cleanup or rsync commands are enough to make
:PCs with 2GB or 3GB memory swap.
:
:--
:Francois Tigeot

The paging is due to a tradeoff for which there is no good answer.
In 2.4 we were holding 'old' cache data for much longer by trading
off against throwing away more recently cached data accessed only
once or twice. rsync and cleanup access data from the filesystem
but essentially in a single scan, so the data is accessed only once
and is thus subject being thrown away quickly.

However this also resulted in serious breakage for another class of
program which needs a large cached data set, for example if a machine
is distributing /usr/src and /usr/src fits in memory, the above code
would actually cause /usr/src to not get entirely cached and cause
disk activity on every scan even though there is enough memory to
cache it.

The kernel can't really tell the difference between the desire to
cache a lot of data and a single-scan operation such as rdist or
hammer cleanup (and even then it has no idea whether the rdist
represents a data set small enough to be cached or not).

So its like half dozen of this or a half dozen of that... there is
no right answer. We would need something like an ARC cache for our
VM pages and vnodes to distinguish between the cases.

-Matt
Matthew Dillon
<>

#12 Updated by dillon almost 5 years ago

I pushed another commit to HAMMER which fixes the most aggregious
kmalloc abuse by the rebalancing code. There are still other aspects
of the code which can cause problems, though. Lets see if this helps
your setup.

-Matt

#13 Updated by ftigeot almost 5 years ago

On Sat, Mar 20, 2010 at 12:52:28PM -0700, Matthew Dillon wrote:
>
> :This machine was really fine running the same workload with 2.4.
> :
> :With 2.5, simple hammer cleanup or rsync commands are enough to make
> :PCs with 2GB or 3GB memory swap.
>
> The kernel can't really tell the difference between the desire to
> cache a lot of data and a single-scan operation such as rdist or
> hammer cleanup (and even then it has no idea whether the rdist
> represents a data set small enough to be cached or not).
>
> So its like half dozen of this or a half dozen of that... there is
> no right answer. We would need something like an ARC cache for our
> VM pages and vnodes to distinguish between the cases.

Okay. Since this is not really something which can be resolved automatically,
is there a way for the administrator to set a hard limit to the memory used
by disk activity ?
A sysctl such as kern.max_disk_cache or so ?

#14 Updated by ftigeot almost 5 years ago

On Sat, Mar 20, 2010 at 01:08:54PM -0700, Matthew Dillon wrote:
> I pushed another commit to HAMMER which fixes the most aggregious
> kmalloc abuse by the rebalancing code. There are still other aspects
> of the code which can cause problems, though. Lets see if this helps
> your setup.

Thanks, I'm upgrading right now.

#15 Updated by Johannes.Hofmann almost 5 years ago

Matthew Dillon <> wrote:
>
> :This machine was really fine running the same workload with 2.4.
> :
> :With 2.5, simple hammer cleanup or rsync commands are enough to make
> :PCs with 2GB or 3GB memory swap.

I'm also seeing this.

> :
> :--
> :Francois Tigeot
>
> The paging is due to a tradeoff for which there is no good answer.
> In 2.4 we were holding 'old' cache data for much longer by trading
> off against throwing away more recently cached data accessed only
> once or twice. rsync and cleanup access data from the filesystem
> but essentially in a single scan, so the data is accessed only once
> and is thus subject being thrown away quickly.
>
> However this also resulted in serious breakage for another class of
> program which needs a large cached data set, for example if a machine
> is distributing /usr/src and /usr/src fits in memory, the above code
> would actually cause /usr/src to not get entirely cached and cause
> disk activity on every scan even though there is enough memory to
> cache it.
>
> The kernel can't really tell the difference between the desire to
> cache a lot of data and a single-scan operation such as rdist or
> hammer cleanup (and even then it has no idea whether the rdist
> represents a data set small enough to be cached or not).
>
> So its like half dozen of this or a half dozen of that... there is
> no right answer. We would need something like an ARC cache for our
> VM pages and vnodes to distinguish between the cases.

Hm, I understand if hammer cleanup removes other data from cache, but
why should the disk cache usage push out data to swap?
I just tried a hammer cleanup on a completely idle system with 1.5G
memory in single user mode, and during the reblocking it started to
use swap and became unresponsive - is this really working as expected?

Cheers,
Johannes

#16 Updated by dillon almost 5 years ago

:Okay. Since this is not really something which can be resolved automatically,
:is there a way for the administrator to set a hard limit to the memory used
:by disk activity ?
:A sysctl such as kern.max_disk_cache or so ?
:
:--
:Francois Tigeot

Being able to predict that a piece of data does not have to be cached
for very long, such as when traversing a very large data set (rdist,
rebalance, reblock) is what allows the rest of the data in the cache to
remain in the cache. This is not an easy prediction to make. Something
like an ARC implementation would do a better job making this prediction.

-Matt
Matthew Dillon
<>

#17 Updated by dillon almost 5 years ago

:Hm, I understand if hammer cleanup removes other data from cache, but
:why should the disk cache usage push out data to swap?
:I just tried a hammer cleanup on a completely idle system with 1.5G
:memory in single user mode, and during the reblocking it started to
:use swap and became unresponsive - is this really working as expected?
:
:Cheers,
:Johannes

You couldn't ^C the reblock? Reblocking works the storage system
pretty heavily, performance issues are not necessarily going to be
related to paging activity. Anything which has to read from disk will
be slow.

How does the system know what pages are idle and what pages are not
idle when the whole system is idle? How can the system distinguish
between the one-time scan that the reblocker does verses, say, someone
rdist'ing a dataset which would easily fit in memory that we DO want
to cache?

-Matt
Matthew Dillon
<>

#18 Updated by Johannes.Hofmann almost 5 years ago

Matthew Dillon <> wrote:
>
> :Hm, I understand if hammer cleanup removes other data from cache, but
> :why should the disk cache usage push out data to swap?
> :I just tried a hammer cleanup on a completely idle system with 1.5G
> :memory in single user mode, and during the reblocking it started to
> :use swap and became unresponsive - is this really working as expected?
> :
> :Cheers,
> :Johannes
>
> You couldn't ^C the reblock? Reblocking works the storage system
> pretty heavily, performance issues are not necessarily going to be
> related to paging activity. Anything which has to read from disk will
> be slow.
>
> How does the system know what pages are idle and what pages are not
> idle when the whole system is idle? How can the system distinguish
> between the one-time scan that the reblocker does verses, say, someone
> rdist'ing a dataset which would easily fit in memory that we DO want
> to cache?

^C did succeed, but it took quite long - no lockup though.
It's understood, that all disk IO will be slow. I was just surprised
that the system uses swap in this case at all. Why should the system
move data out to swap to make place for the disk cache - at least if
swap does not happen to be on SSD?
I would have thought that only memory not otherwise needed is used to
cache disk data.

I will retry with swap disabled completely.

Johannes

#19 Updated by elekktretterr almost 5 years ago

> This machine was really fine running the same workload with 2.4.
>
> With 2.5, simple hammer cleanup or rsync commands are enough to make
> PCs with 2GB or 3GB memory swap.

Also seeing this, but no crash. The system constantly uses at least 800MB
swap. This system has 4gb RAM, but only 3.2 is usable because it runs
32bit kernel. Im going to upgrade the system to 64bit after the release.

Petr

#20 Updated by dillon almost 5 years ago

:Also seeing this, but no crash. The system constantly uses at least 800MB
:swap. This system has 4gb RAM, but only 3.2 is usable because it runs
:32bit kernel. Im going to upgrade the system to 64bit after the release.
:
:Petr

If that much is constantly swapped out (and swapcache isn't being
used) then the run-time footprint is larger then the 3G of ram the
32bit kernel can use.

-Matt
Matthew Dillon
<>

#21 Updated by elekktretterr almost 5 years ago

> If that much is constantly swapped out (and swapcache isn't being
> used) then the run-time footprint is larger then the 3G of ram the
> 32bit kernel can use.

What do you mean run-time footprint?

Memory: 967M Active, 1871M Inact, 527M Wired, 72M Cache, 199M Buf, 71M Free
Swap: 4096M Total, 816M Used, 3280M Free, 19% Inuse

I cannot figure out whats using so much active and inactive memory. It
only runs a not-so busy postgres database (although it does have rather
large 2-3gb dataset). However I do notice that everytime hammer cleanup
runs. ie after restart, the memory usage goes up all the way to swap.

It kind of seems like a lot of data is cached, but never freed. This box
has been up for about 2 months without a restart. The swap usage grew
during those two months from 0% to 20%, then decreased to current 19%.

Petr

#22 Updated by elekktretterr almost 5 years ago

> If that much is constantly swapped out (and swapcache isn't being
> used) then the run-time footprint is larger then the 3G of ram the
> 32bit kernel can use.

I turned off postgres on the box, and at that moment all the used swap
memory and about 500MB from active memory was cleared. Which raises a
question: why is HAMMER using so much memory, causing applications like
postgres to use swap (and thus run slow). The box has over 3GB of usable
RAM, this should be well enough. I suppose HAMMER is caching the
postgresql ondisk data? Should it cache it? FBSD+UFS2 with the same
postgres dataset uses way less memory.

My other box which runs DF with HAMMER has pretty normal memory usage but
it doesnt run any database software.

Petr

#23 Updated by ftigeot almost 5 years ago

On Sat, Mar 20, 2010 at 11:52:51PM +0000, elekktretterr (via DragonFly issue tracker) wrote:
>
> elekktretterr <> added the comment:
>
> > If that much is constantly swapped out (and swapcache isn't being
> > used) then the run-time footprint is larger then the 3G of ram the
> > 32bit kernel can use.
>
> What do you mean run-time footprint?
>
> Memory: 967M Active, 1871M Inact, 527M Wired, 72M Cache, 199M Buf, 71M Free
> Swap: 4096M Total, 816M Used, 3280M Free, 19% Inuse
>
> I cannot figure out whats using so much active and inactive memory. It
> only runs a not-so busy postgres database (although it does have rather
> large 2-3gb dataset). However I do notice that everytime hammer cleanup
> runs. ie after restart, the memory usage goes up all the way to swap.

This one is from a recently started workstation:

8:10AM up 17 mins, 2 users, load averages: 0.77, 0.70, 0.47

Memory: 1094M Active, 1519M Inact, 272M Wired, 114M Cache, 199M Buf, 4460K Free
Swap: 4096M Total, 13M Used, 4083M Free, 2048K Out

The only applications running were Xorg, a bunch of xterms and a web browser
(seamonkey) with 3 tabs open.

I then launched a "hammer cleanup". After a while, the system started to push
application memory to swap. I could see a delay in window redrawing when
switching from one web tab to the other.

Now that "hammer cleanup" has finished running, there is no more swapping but
there are still gobs of active and inactive memory:

Memory: 990M Active, 1609M Inact, 353M Wired, 7328K Cache, 199M Buf, 46M Free
Swap: 4096M Total, 18M Used, 4078M Free

#24 Updated by dillon almost 5 years ago

:I turned off postgres on the box, and at that moment all the used swap
:memory and about 500MB from active memory was cleared. Which raises a
:question: why is HAMMER using so much memory, causing applications like
:postgres to use swap (and thus run slow). The box has over 3GB of usable
:RAM, this should be well enough. I suppose HAMMER is caching the
:postgresql ondisk data? Should it cache it? FBSD+UFS2 with the same
:postgres dataset uses way less memory.
:
:My other box which runs DF with HAMMER has pretty normal memory usage but
:it doesnt run any database software.
:
:Petr

Well, did you 'ps axl' and look at the VSZ of the processes? That
tells you essentially how much memory they have allocated. It kinda
sounds like the postgres process might have been eating a lot of
memory unrelated to the kernel's file caching.

HAMMER isn't really doing anything any different from UFS when it
comes to file data caching.

-Matt
Matthew Dillon
<>

#25 Updated by dillon almost 5 years ago

Definitely try raising sysctl vfs.vm_cycle_point as Sam suggested. This
can be done on a live system. I would like to try to avoid ripping
up the buffer cache this close to release.

Try setting it to 40 (it defaults to 23). The max value is 64
which will effectively put all non-memory-mapped file data on the
inactive queue, including any data which is repeatedly accessed.

-Matt

#26 Updated by c.turner almost 5 years ago

wrote:
> question: why is HAMMER using so much memory, causing applications like
> postgres to use swap (and thus run slow).

this response isn't at all hammer related.. but in any case:

Still a bit fuzzy on some the related issues that might be at play here,
but as I understand it, the postgres data cache uses as much shared
memory as it 'wants' (e.g. up to max configured) to hold the database
data - so if this machine was up for a long time, potentially the entire
database data set is in memory -

and if the data set isn't used very often, it might get swapped out by
something else running or just from inactivity ..

If you want to keep this from happening, you can set:

kern.ipc.shm_use_phys=1

in /boot/loader.conf

#27 Updated by ftigeot almost 5 years ago

On Sun, Mar 21, 2010 at 11:24:35AM -0700, Matthew Dillon wrote:
> Definitely try raising sysctl vfs.vm_cycle_point as Sam suggested. This
> can be done on a live system. I would like to try to avoid ripping
> up the buffer cache this close to release.

I can understand that.

> Try setting it to 40 (it defaults to 23). The max value is 64
> which will effectively put all non-memory-mapped file data on the
> inactive queue, including any data which is repeatedly accessed.

Done. I raised it previously to 30 (I think) and still had a crash.

FWIW, this machine is also running a postgres server.

#28 Updated by dillon almost 5 years ago

:> Try setting it to 40 (it defaults to 23). The max value is 64
:> which will effectively put all non-memory-mapped file data on the
:> inactive queue, including any data which is repeatedly accessed.
:
:Done. I raised it previously to 30 (I think) and still had a crash.
:
:FWIW, this machine is also running a postgres server.
:
:--
:Francois Tigeot

It wouldn't effect the crash condition, which was a low memory
condition generated by cache+free followed by a deadlock. The
cycle point only effects active vs inactive.

But raising the cycle point should reduce the degree by which
active program memory is swapped out. On the flip-side, the
cost of doing this is that file data may be thrown out too quickly.
That is, cacheable file data will be mixed in with uncacheable
file data in the inactive queue and we'll be throwing the baby out
with the dishwater, so to speak.

Still, if this fixes the rdist / reblocking issue maybe we should just
eat the early hucking of cached file data in favor program data not
getting paged so aggressively, since people seem to notice when the
latter happens a lot more.

I'd like to know if this solves your particular problem (the paging
issue, not the crashing issue), and if so I will change the default
cycle point for the release. So play with it.

-Matt
Matthew Dillon
<>

#29 Updated by ftigeot almost 5 years ago

On Sun, Mar 21, 2010 at 12:14:11PM -0700, Matthew Dillon wrote:
>
> :> Try setting it to 40 (it defaults to 23). The max value is 64
> :> which will effectively put all non-memory-mapped file data on the
> :> inactive queue, including any data which is repeatedly accessed.
> :
> :Done. I raised it previously to 30 (I think) and still had a crash.
>
> It wouldn't effect the crash condition, which was a low memory
> condition generated by cache+free followed by a deadlock. The
> cycle point only effects active vs inactive.
>
> But raising the cycle point should reduce the degree by which
> active program memory is swapped out. On the flip-side, the
> cost of doing this is that file data may be thrown out too quickly.
> That is, cacheable file data will be mixed in with uncacheable
> file data in the inactive queue and we'll be throwing the baby out
> with the dishwater, so to speak.
>
> I'd like to know if this solves your particular problem (the paging
> issue, not the crashing issue), and if so I will change the default
> cycle point for the release. So play with it.

We'll see. For now, there's a rsync backup running and the paging activity
seems to get more intense:

Memory: 473M Active, 1038M Inact, 441M Wired, 46M Cache, 199M Buf, 3380K Free
Swap: 4096M Total, 149M Used, 3947M Free, 3% Inuse, 4336K Out

#30 Updated by elekktretterr almost 5 years ago

> Still a bit fuzzy on some the related issues that might be at play here,
> but as I understand it, the postgres data cache uses as much shared
> memory as it 'wants' (e.g. up to max configured) to hold the database
> data - so if this machine was up for a long time, potentially the entire
> database data set is in memory -
>
> and if the data set isn't used very often, it might get swapped out by
> something else running or just from inactivity ..
>
> If you want to keep this from happening, you can set:
>
> kern.ipc.shm_use_phys=1
>
> in /boot/loader.conf

Here is the situation: Machine 1: FBSD 7.2. Machine 2: DragonFly. Both
boxes have 4gb memory. Both run postgresql 8.4 and their configuration is
same: 800MB shared memory(actually the FBSD box has 1GB shared mem), 2GB
effective cache size, 128MB work mem.

The database data set is about 2-3GB on both boxes(actually they replicate
exactly the same database). The FBSD box also runs MySQL and Apache(and
known memory hogs like Ruby on Rails). Yet this box always has at least
has few hundred MB free memory. Ive *never* seen it swap, and if it ever
did, then only briefly.

top from the fbsd box:
Mem: 777M Active, 2365M Inact, 383M Wired, 26M Cache, 399M Buf, 395M Free
Swap: 4096M Total, 4096M Free

top from the DF box:
Memory: 878M Active, 1962M Inact, 516M Wired, 146M Cache, 188M Buf, 6132K
Free
Swap: 4096M Total, 10M Used, 4086M Free

ps axl from the fbsd box:

70 740 737 0 44 0 1052056 210136 select Ss ?? 6:06.89
postgres: writer process (postgres)
70 741 737 0 44 0 1052056 6004 select Ss ?? 3:45.14
postgres: wal writer process (postgres)
70 742 737 0 44 0 1053080 5268 select Ss ?? 2:06.09
postgres: autovacuum launcher process (postgres)
70 743 737 0 44 0 22576 4796 select Ss ?? 2:56.13
postgres: stats collector process (postgres)
70 96486 737 0 4 0 1055128 17012 sbwait Is ?? 0:00.05
postgres: aubill aubill_production 203.xx.xxx.xxx(55645) (postgres)
70 96781 737 0 4 0 1054104 12152 sbwait Is ?? 0:00.01
postgres: aubill aubill_production 203.xx.xxx.xxx(61963) (postgres)
70 96784 737 0 4 0 1054104 13064 sbwait Is ?? 0:00.02
postgres: aubill aubill_production 203.xx.xxx.xxx(63374) (postgres)

ps axl from the DF box:
1002 22890 22888 0 152 0 853324 15996 select SLs ?? 0:09.07
postgres: writer process (postgres)
1002 22891 22888 0 152 0 853132 4940 select SLs ?? 0:09.15
postgres: wal writer process (postgres)
1002 22892 22888 0 152 0 853420 2064 select SLs ?? 0:02.81
postgres: autovacuum launcher process (postgres)
1002 22893 22888 0 152 0 9108 1440 poll SLMs ?? 0:10.35
postgres: stats collector process (postgres)
1002 27196 22888 0 152 0 854628 10444 sbwait ILs ?? 0:00.06
postgres: aubill aubill_production xxx.xxx.xxx.xx(56668) idle (postgres)
1002 27260 22888 0 152 0 855156 507492 sbwait ILs ?? 0:24.36
postgres: aubill aubill_production 203.xxx.xxx.xxx(58165) idle (postgres)
1002 27274 22888 0 152 0 854084 4740 sbwait ILs ?? 0:00.00
postgres: aubill aubill_production 203.xxx.xxx.xxx(64789) idle (postgres)
1002 27275 22888 1 152 0 854020 4748 sbwait ILs ?? 0:00.00
postgres: aubill aubill_production 203.xxx.xxx.xxx(60636) idle (postgres)

#31 Updated by dillon over 4 years ago

:top from the fbsd box:
:Mem: 777M Active, 2365M Inact, 383M Wired, 26M Cache, 399M Buf, 395M Free
:Swap: 4096M Total, 4096M Free
:
:top from the DF box:
:Memory: 878M Active, 1962M Inact, 516M Wired, 146M Cache, 188M Buf, 6132K
:Free
:Swap: 4096M Total, 10M Used, 4086M Free

Well, it kinda looks to me like there's nothing wrong. Are you seeing
excessive paging activity via systat -vm 1 ? If you don't see excessive
paging activity then the fact that data has been pushed to swap will not
have any effect on machine performance. It just means the machine flushed
dirty data to swap so it could be marked clean. The data could still
be in memory, in fact.

-Matt
Matthew Dillon
<>

#32 Updated by ftigeot over 4 years ago

On Sun, Mar 21, 2010 at 12:14:11PM -0700, Matthew Dillon wrote:
>
> :> Try setting it to 40 (it defaults to 23). The max value is 64
> :> which will effectively put all non-memory-mapped file data on the
> :> inactive queue, including any data which is repeatedly accessed.
>
> I'd like to know if this solves your particular problem (the paging
> issue, not the crashing issue), and if so I will change the default
> cycle point for the release. So play with it.

With vfs.vm_cycle_point = 40 my server was much more responsive this morning.
Almost no programs had to be paged in when first accessed.

So far, so good.

I have now decreased vm_cycle_point to 32.

#33 Updated by ftigeot over 4 years ago

On Mon, Mar 22, 2010 at 09:39:07AM +0100, Francois Tigeot wrote:
> On Sun, Mar 21, 2010 at 12:14:11PM -0700, Matthew Dillon wrote:
> >
> > I'd like to know if this solves your particular problem (the paging
> > issue, not the crashing issue), and if so I will change the default
> > cycle point for the release. So play with it.
>
> With vfs.vm_cycle_point = 40 my server was much more responsive this morning.
> Almost no programs had to be paged in when first accessed.
>
> I have now decreased vm_cycle_point to 32.

With 32 the situation is slightly worse: some processes are once again paged
out when not constantly accessed.

I'm currently testing with vfs.vm_cycle_point = 36.

#34 Updated by ftigeot over 4 years ago

On Mon, Mar 22, 2010 at 12:02:25PM +0100, Francois Tigeot wrote:
> On Mon, Mar 22, 2010 at 09:39:07AM +0100, Francois Tigeot wrote:
> > On Sun, Mar 21, 2010 at 12:14:11PM -0700, Matthew Dillon wrote:
> > >
> > > I'd like to know if this solves your particular problem (the paging
> > > issue, not the crashing issue), and if so I will change the default
> > > cycle point for the release. So play with it.
> >
> > With vfs.vm_cycle_point = 40 my server was much more responsive this morning.
> > Almost no programs had to be paged in when first accessed.
> >
> > I have now decreased vm_cycle_point to 32.
>
> With 32 the situation is slightly worse: some processes are once again paged
> out when not constantly accessed.
>
> I'm currently testing with vfs.vm_cycle_point = 36.

36 is slightly better but one of the running applications was still massively
paged out after a few hours.

I vote for setting vfs.vm_cycle_point to 40.

#35 Updated by ftigeot over 4 years ago

On Mon, Mar 22, 2010 at 10:42:49PM -0700, Matthew Dillon wrote:
> :36 is slightly better but one of the running applications was still massively
> :paged out after a few hours.
> :
> :I vote for setting vfs.vm_cycle_point to 40.
>
> Ok, so 36 or 40. I'm a bit worried about using 40, but I think its
> worth setting it to 40 for the release and seeing how it fares. People
> can always drop it down if weird situations crop up.

I'm not completely happy either: even with 40 some applications get paged out,
which didn't occur with 2.4.

I also had a new deadlock this night (possibly during the daily periodic run).
Symptoms were the same as usual: no keyboard input, no network activity, many
pmap_interlock messages on the console...

The new crash dump is there: http://www.wolfpond.org/crash.dfly/

I have now set vfs.vm_cycle_point to 64 just in case. I may still be
positively surprised...

#36 Updated by dillon over 4 years ago

:36 is slightly better but one of the running applications was still massively
:paged out after a few hours.
:
:I vote for setting vfs.vm_cycle_point to 40.
:
:--
:Francois Tigeot

Ok, so 36 or 40. I'm a bit worried about using 40, but I think its
worth setting it to 40 for the release and seeing how it fares. People
can always drop it down if weird situations crop up.

Theoretically the normal paging hysteresis should be sufficient but
there is still a chance that the inactive queue can wind up too small
at 40.

-Matt
Matthew Dillon
<>

#37 Updated by dillon over 4 years ago

:I'm not completely happy either: even with 40 some applications get paged out,
:which didn't occur with 2.4.

There's nothing wrong with an application getting paged out if it
is idle. It really isn't a bad thing. It frees up memory for the
apps which ARE running.

What people don't seem to like is when a lot of apps get paged out
overnight in order to satisfy certain overnight operations which
operate on a very large data set (the entire filesystem) and
can't be cached anyway.

:I also had a new deadlock this night (possibly during the daily periodic run).
:Symptoms were the same as usual: no keyboard input, no network activity, many
:pmap_interlock messages on the console...
:
:The new crash dump is there: http://www.wolfpond.org/crash.dfly/
:
:I have now set vfs.vm_cycle_point to 64 just in case. I may still be
:positively surprised...
:
:--
:Francois Tigeot

Ok, I'll look at it.

-Matt
Matthew Dillon
<>

#38 Updated by steve over 4 years ago

On Mon, 22 Mar 2010 22:42:49 -0700 (PDT)
Matthew Dillon <> wrote:

> :36 is slightly better but one of the running applications was still
> massively :paged out after a few hours.
> :
> :I vote for setting vfs.vm_cycle_point to 40.
> :
> :--
> :Francois Tigeot
>
> Ok, so 36 or 40. I'm a bit worried about using 40, but I think its
> worth setting it to 40 for the release and seeing how it fares.
> People can always drop it down if weird situations crop up.
>
> Theoretically the normal paging hysteresis should be sufficient but
> there is still a chance that the inactive queue can wind up too small
> at 40.

I'm running at 40 at the moment on this box with 1.5GB of RAM that
used to hardly touch swap but these days is often to be seen with 300Mb of
swap in use and 800Mb of inactive. I think I'd rather see the swap usage go
away and the inactive queue shorter. Adding up the vsz of all the processes
running gets me to about 1GB. It would be nice not to be using swap again.

Raising the setting to 40 seems to have increased total swap usage
at the same time as reducing the tendency for things I use to be swapped
out in the morning - this puzzles me somewhat.

#39 Updated by dillon over 4 years ago

: I'm running at 40 at the moment on this box with 1.5GB of RAM that
:used to hardly touch swap but these days is often to be seen with 300Mb of
:swap in use and 800Mb of inactive. I think I'd rather see the swap usage go
:away and the inactive queue shorter. Adding up the vsz of all the processes
:running gets me to about 1GB. It would be nice not to be using swap again.
:
: Raising the setting to 40 seems to have increased total swap usage
:at the same time as reducing the tendency for things I use to be swapped
:out in the morning - this puzzles me somewhat.
:
:--
:Steve O'Hara-Smith | Directable Mirror Arrays

You can change the VM paging targets. You can try reducing
vm.stats.vm.v_inactive_target. Those values are in pages. Do
not make it too small.

-Matt
Matthew Dillon
<>

#40 Updated by steve over 4 years ago

On Tue, 23 Mar 2010 14:15:43 -0700 (PDT)
Matthew Dillon <> wrote:

> You can change the VM paging targets. You can try reducing
> vm.stats.vm.v_inactive_target. Those values are in pages. Do
> not make it too small.

Apparently it's read only - boot time setting only ?

#41 Updated by dillon over 4 years ago

:I also had a new deadlock this night (possibly during the daily periodic run).
:Symptoms were the same as usual: no keyboard input, no network activity, many
:pmap_interlock messages on the console...
:
:The new crash dump is there: http://www.wolfpond.org/crash.dfly/
:
:I have now set vfs.vm_cycle_point to 64 just in case. I may still be
:positively surprised...
:
:--
:Francois Tigeot

Ok, I've pushed more fixes to head, please try it.

It's STILL the rebalancing code. This time it deadlocked waiting for
the VM system to free up memory while holding shared locks on a
HAMMER cursor.

We will keep going until these deadlocks go away, but I'm going to
have to rethink portions of the kernel memory allocator w/ regards
to HAMMER for sure after we roll the release.

-Matt
Matthew Dillon
<>

#42 Updated by dillon over 4 years ago

:On Tue, 23 Mar 2010 14:15:43 -0700 (PDT)
:Matthew Dillon <> wrote:
:
:> You can change the VM paging targets. You can try reducing
:> vm.stats.vm.v_inactive_target. Those values are in pages. Do
:> not make it too small.
:
: Apparently it's read only - boot time setting only ?

Ooops, wrong sysctl. Use sysctl vm.v_inactive_target. Don't make
it any smaller than 1/4 vm.stats.vm.v_page_count.

-Matt
Matthew Dillon
<>

#43 Updated by elekktretterr over 4 years ago

> We will keep going until these deadlocks go away, but I'm going to
> have to rethink portions of the kernel memory allocator w/ regards
> to HAMMER for sure after we roll the release.

Hi Matt,

On a side note, have you finished the redo code for HAMMER? I'm not trying
to make a sense of urgency or something, but I'm just curious as I
remember you were working on it back in January.

Thanks,
Petr

#44 Updated by dillon over 4 years ago

:Hi Matt,
:
:On a side note, have you finished the redo code for HAMMER? I'm not trying
:to make a sense of urgency or something, but I'm just curious as I
:remember you were working on it back in January.
:
:Thanks,
:Petr

The media side of REDO has been finished for a while, but the
recovery code is a sticking point. I still think I can get it
operational for this release but it will be down to the wire.

-Matt
Matthew Dillon
<>

#45 Updated by steve over 4 years ago

On Tue, 23 Mar 2010 16:37:00 -0700 (PDT)
Matthew Dillon <> wrote:

> :On Tue, 23 Mar 2010 14:15:43 -0700 (PDT)
> :Matthew Dillon <> wrote:
> :
> :> You can change the VM paging targets. You can try reducing
> :> vm.stats.vm.v_inactive_target. Those values are in pages. Do
> :> not make it too small.
> :
> : Apparently it's read only - boot time setting only ?
>
> Ooops, wrong sysctl. Use sysctl vm.v_inactive_target. Don't make
> it any smaller than 1/4 vm.stats.vm.v_page_count.

That's better, it still pushed a load out to swap overnight but
much less of it needed to come back. I suppose this box with *only* 1.5GB
is showing it's age.

#46 Updated by ftigeot over 4 years ago

On Tue, Mar 23, 2010 at 11:34:16PM +0000, Matthew Dillon (via DragonFly issue tracker) wrote:
>
> Matthew Dillon <> added the comment:
>
> :I also had a new deadlock this night (possibly during the daily periodic run).
> :Symptoms were the same as usual: no keyboard input, no network activity, many
> :pmap_interlock messages on the console...
> :
> :The new crash dump is there: http://www.wolfpond.org/crash.dfly/
> :
> :--
> :Francois Tigeot
>
> It's STILL the rebalancing code. This time it deadlocked waiting for
> the VM system to free up memory while holding shared locks on a
> HAMMER cursor.

I just got another deadlock today.

It was a bit better compared to the previous ones: the ssh connection I had
in a xterm was still available; screen(1) was running and it was possible
to change virtual terminals with Ctrl-n.
All other inputs were not taken into account.

The console had a bunch of the customary
"Debug: pmap_interlock 00010003"
messages. The last 4 lines were
"bio_page_alloc: WARNING emergency page allocation"

#47 Updated by ftigeot over 3 years ago

problem doesn't occur anymore with recent releases

Also available in: Atom PDF