Bug #2542

Kernel freezes for minutes at a time

Added by ftigeot over 1 year ago. Updated 9 months ago.

Status:NewStart date:04/07/2013
Priority:NormalDue date:
Assignee:-% Done:

10%

Category:-
Target version:-

Description

Running the new poudriere-devel 2.3_4 port makes the system freeze for minutes at a time after a while.

* This is not the case at first, poudriere has to run for at least one hour for this phenomenon to become visible
* Display is frozen in place, periodically refreshed widgets such as clocks stop moving
* Keyboard and mouse input is not processed but is handled after a few minutes when the system becomes operational again
* Launching big applications such as firefox is enough to make the system freeze
* "Warning: deep namecache recursion at (null)" messages are printed on the console

When the system becomes operational again, top(1) shows one cpu thread using 100% cpu (system time).

A core dump taken when the system is in this state will be made available.

History

#1 Updated by ftigeot over 1 year ago

This behavior was seen with 8 threads Xeon systems, with both i386 and amd64 DragonFly-3.4rc installations.

#2 Updated by ftigeot over 1 year ago

In one freeze instance an xterm was still responsive and I could run some simple utilites.

systat -pv 1:
timer ipi extint user% nice% sys% intr% idle% tokcol token
cpu0 282 8 79 0.0 0.0 0.0 0.8 99.2 0
cpu1 283 0 0 0.0 0.0 0.0 0.0 100.0 0
cpu2 302 1 0 0.0 0.0 0.0 0.0 100.0 0
cpu3 281 0 0 0.0 0.0 100.0 0.0 0.0 6504549 vmobj
cpu4 282 0 0 0.0 0.0 0.0 0.0 100.0 0
cpu5 5 391 4 0 0.8 0.0 0.0 0.0 99.2 54844 mp_toke
cpu6 288 3 0 1.5 0.0 0.0 0.0 98.5 0
cpu7 282 8 0 0.0 0.0 0.0 0.0 100.0 19 pool

top(1) also showed some 'mv' zombie processes; my windowmanager was frozen in 'RUN' state and didn't answer any command.

#3 Updated by ftigeot over 1 year ago

A core dump is available in leaf:~ftigeot/crash/crash.issue2542

#4 Updated by ftigeot over 1 year ago

  • % Done changed from 0 to 10

Setting vm.read_shortcut_enable to 0 has made the freezes disappear.

#5 Updated by marino over 1 year ago

As an interested 3rd party, does "vm.read_shortcut_enable=1=freeze" imply where the problem lies? And do we know why this phenomenon (apparently) is not seen universally?

#6 Updated by marino over 1 year ago

I had a machine building that was in "suspended animation" for 4 days.
It started working spontaneously after I hit "ctrl-t" to see what state poudriere was in. Apparently this stimulation kicked it back on track. It uses hammer.

ftigeot thinks this is the same thing.
If so, bug report confirmed.

#7 Updated by daniel.ramos 9 months ago

  • Description updated (diff)

I was able to reproduce this (or a very similar) bug on a dual-core KVM VPS running DFly 3.4.3. Basically, it was impossible to complete a "make buildworld" due to long and frequent freezes (clock skew, no response to keyboard input or ping). My setup does not use HAMMER, and changing vm.read_shortcut_enable did not seem to make any difference.

Since updating to 3.5 master (by waiting out the freezes), the issue is resolved - I'm now able to make buildworld without any freezing whatsoever.

Also available in: Atom PDF