Bug #1912

diskless vkernel: corrupted files after "pkg_admin check"

Added by rumcic almost 4 years ago. Updated almost 4 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:


Target version:-


After pkg_admin check is run (part of daily cronjobs) many files get corrupted
(/var/db/pkg, /etc, probably other dirs affected as well).
All the files are still there and the same size, but e.g. cat outputs nothing
while with vi, many "^@" can be seen instead of the expected content.

It seems this happens only on tmpfs mounts, while files on the nfs mount do not
seem to be affected at all.
Have been trying to repeat the behaviour on a physical box, but was unable to
repeat it, it seems only vkernels are affected by this.
Please do not CC me, since I already receive everything from these MLs.



#1 Updated by rumcic almost 4 years ago

More info ...
After adding some swap to the vkernel, corruption does not occur (have not been
able to repeat it). It seems that it tries to use a lot of memory and when
unable to swap, data corruption occurs, but no error can be seen anywhere (other
than the data disappearing).
In my instance, after pkg_admin's check was finished, ~30MB of swap was used.

I guess the problem is in tmpfs_write/read(maybe even buwrite)?
swap_pager_strategy should be setting ENOMEM as the error when no swap space
left, but I'm not quite sure how that propagates (if it does) to tmpfs' functions

#2 Updated by alexh almost 4 years ago

No, what you should be looking at are the strategy routines, i.e.
tmpfs_strategy. From a short glance it should actually work just fine.
tmpfs_strategy calls swap_pager_strategy, which sets the buf error to ENOMEM if
swp_pager_getswapspace returns SWAPBLK_NONE, which then gets biodone()d further
down, outside the loop.

Can you check the return value of swap_pager_getswapspace? is it returning
SWAPBLK_NONE? try kprintf'ing something there.


#3 Updated by alexh almost 4 years ago

The write path actually takes buwrite() which is fine, so that the strategy
routine(s) don't even get called. Since buwrite() just marks pages as dirty, the
problem might be somewhere around that and its behaviour under memory pressure.

Also available in: Atom PDF