Bug #2039
closedSometimes, DragonFly 2.9 systems can not reboot
0%
Description
uname -a output:
DragonFly dfly32.zefyris.com 2.9-DEVELOPMENT DragonFly
v2.9.1.1027.gb133d-DEVELOPMENT #3: Sun Apr 3 10:24:40 CEST 2011
ftigeot@dfly32.zefyris.com:/usr/obj/usr/src/sys/GENERIC_SMP i386
The problem may be present on older versions of 2.9.
After issuing a "shutdown -r now" command as superuser, the system starts the
shutdown process
The last console messages are:
Syncing disks...
done.
the machine then waits forever; it can only be rebooted with a hard reset.
I'm setting the priority to critical, thinking about poor souls having to manage
remotely colocated servers.
Updated by sepherosa over 13 years ago
On Sun, Apr 3, 2011 at 5:00 PM, Francois Tigeot (via DragonFly issue
tracker) <sinknull@leaf.dragonflybsd.org> wrote:
New submission from Francois Tigeot <ftigeot@wolfpond.org>:
uname -a output:
DragonFly dfly32.zefyris.com 2.9-DEVELOPMENT DragonFly
v2.9.1.1027.gb133d-DEVELOPMENT #3: Sun Apr 3 10:24:40 CEST 2011
ftigeot@dfly32.zefyris.com:/usr/obj/usr/src/sys/GENERIC_SMP i386The problem may be present on older versions of 2.9.
After issuing a "shutdown -r now" command as superuser, the system starts the
shutdown process
sysctl hw.acpi.handle_reboot=1 && shutdown -r now
Does the above help?
Best Regards,
sephe
Updated by ftigeot over 13 years ago
On Tue, Apr 05, 2011 at 09:43:36AM +0000, Sepherosa Ziehau (via DragonFly issue tracker) wrote:
Sepherosa Ziehau <sepherosa@gmail.com> added the comment:
On Sun, Apr 3, 2011 at 5:00 PM, Francois Tigeot (via DragonFly issue
tracker) <sinknull@leaf.dragonflybsd.org> wrote:New submission from Francois Tigeot <ftigeot@wolfpond.org>:
uname -a output:
DragonFly dfly32.zefyris.com 2.9-DEVELOPMENT DragonFly
v2.9.1.1027.gb133d-DEVELOPMENT #3: Sun Apr 3 10:24:40 CEST 2011
ftigeot@dfly32.zefyris.com:/usr/obj/usr/src/sys/GENERIC_SMP i386The problem may be present on older versions of 2.9.
After issuing a "shutdown -r now" command as superuser, the system starts the
shutdown processsysctl hw.acpi.handle_reboot=1 && shutdown -r now
Does the above help?
Sorry, I can't test: I do not have access to this machine anymore.
Updated by sepherosa over 13 years ago
On Tue, Apr 5, 2011 at 5:40 PM, Sepherosa Ziehau <sepherosa@gmail.com> wrote:
On Sun, Apr 3, 2011 at 5:00 PM, Francois Tigeot (via DragonFly issue
tracker) <sinknull@leaf.dragonflybsd.org> wrote:New submission from Francois Tigeot <ftigeot@wolfpond.org>:
uname -a output:
DragonFly dfly32.zefyris.com 2.9-DEVELOPMENT DragonFly
v2.9.1.1027.gb133d-DEVELOPMENT #3: Sun Apr 3 10:24:40 CEST 2011
ftigeot@dfly32.zefyris.com:/usr/obj/usr/src/sys/GENERIC_SMP i386The problem may be present on older versions of 2.9.
After issuing a "shutdown -r now" command as superuser, the system starts the
shutdown process
I have found a reliable way to trigger it:
switch to single user mode
mount -a
cd /usr/src
make installworld && make upgrade && reboot
print_uptime() has not been called in my case.
Best Regards,
sephe
sysctl hw.acpi.handle_reboot=1 && shutdown -r now
Does the above help?
Best Regards,
sephe--
Tomorrow Will Never Die
Updated by dillon over 13 years ago
:I have found a reliable way to trigger it:
:switch to single user mode
:mount -a
:cd /usr/src
:make installworld && make upgrade && reboot
:
:print_uptime() has not been called in my case.
:
:Best Regards,
:sephe
Hmm. If the uptime is not displayed this implies that the
vfs_unmountall() call is getting stuck.
Try with the latest master, I made some adjustments that might effect
raw device closes. If swapcache is turned on try turning it off
(though my latest patch turns it off on shutdown automatically, too).
So far I cannot replicate the issue on my test box. I did have
reboot issues in past related to swapcache but those are gone now
w/my recent commits.
Other possible causes: tmpfs, vn, usb mounts, procfs, etc.
If you can reliably replicate the problem you may have to add a bunch
of kprintf()'s to the umountall iterator to track down which filesystem
is getting stuck. I have included a df of my test box below.
-Matt
Filesystem 1K-blocks Used Avail Capacity Mounted on
ROOT 195452928 64848560 130604368 33% /
devfs 1 1 0 100% /dev
/dev/serno/L41K2H5G.s1a 774094 216628 495540 30% /boot
/pfs/@-1:00001 195452928 64848560 130604368 33% /var
@-1:00002 195452928 64848560 130604368 33% /tmp
/pfs/
/pfs/@-1:00003 195452928 64848560 130604368 33% /usr
@-1:00004 195452928 64848560 130604368 33% /home
/pfs/
/pfs/@-1:00005 195452928 64848560 130604368 33% /usr/obj
@-1:00006 195452928 64848560 130604368 33% /var/crash
/pfs/
/pfs/@@-1:00007 195452928 64848560 130604368 33% /var/tmp
BUILD 104398848 42510688 61888160 41% /build3
procfs 4 4 0 100% /proc
apollo.backplane.com:/usr/src 1934024704 292436496 1641588208 15% /usr/src
apollo.backplane.com:/usr/src-misc 1934024704 292436496 1641588208 15% /usr/src-misc
apollo.backplane.com:/usr/pkgsrc 1934024704 292436496 1641588208 15% /usr/pkgsrc
apollo.backplane.com:/netboot1 1934024704 292436496 1641588208 15% /netboot1
test29#
Updated by sepherosa over 13 years ago
On Wed, Apr 6, 2011 at 2:16 PM, Matthew Dillon
<dillon@apollo.backplane.com> wrote:
:I have found a reliable way to trigger it:
:switch to single user mode
:mount -a
:cd /usr/src
:make installworld && make upgrade && reboot
:
:print_uptime() has not been called in my case.
:
:Best Regards,
:sepheHmm. If the uptime is not displayed this implies that the
vfs_unmountall() call is getting stuck.
Yes, vfs_unmountall() blocks the rebooting.
With the following patch:
http://leaf.dragonflybsd.org/~sephe/umountall_print.diff
In single user mode:
make installworld && make upgrade && reboot
...
...
hammer callback start
<---------- ("nobusy callback done" is not logged, and reboot stops here)
Best Regards,
sephe
Try with the latest master, I made some adjustments that might effect
raw device closes. If swapcache is turned on try turning it off
(though my latest patch turns it off on shutdown automatically, too).So far I cannot replicate the issue on my test box. I did have
reboot issues in past related to swapcache but those are gone now
w/my recent commits.Other possible causes: tmpfs, vn, usb mounts, procfs, etc.
If you can reliably replicate the problem you may have to add a bunch
of kprintf()'s to the umountall iterator to track down which filesystem
is getting stuck. I have included a df of my test box below.-Matt
Filesystem 1K-blocks Used Avail Capacity Mounted on
ROOT 195452928 64848560 130604368 33% /
devfs 1 1 0 100% /dev
/dev/serno/L41K2H5G.s1a 774094 216628 495540 30% /boot
/pfs/@-1:00001 195452928 64848560 130604368 33% /var
@-1:00002 195452928 64848560 130604368 33% /tmp
/pfs/
/pfs/@-1:00003 195452928 64848560 130604368 33% /usr
@-1:00004 195452928 64848560 130604368 33% /home
/pfs/
/pfs/@-1:00005 195452928 64848560 130604368 33% /usr/obj
@-1:00006 195452928 64848560 130604368 33% /var/crash
/pfs/
/pfs/@@-1:00007 195452928 64848560 130604368 33% /var/tmp
BUILD 104398848 42510688 61888160 41% /build3
procfs 4 4 0 100% /proc
apollo.backplane.com:/usr/src 1934024704 292436496 1641588208 15% /usr/src
apollo.backplane.com:/usr/src-misc 1934024704 292436496 1641588208 15% /usr/src-misc
apollo.backplane.com:/usr/pkgsrc 1934024704 292436496 1641588208 15% /usr/pkgsrc
apollo.backplane.com:/netboot1 1934024704 292436496 1641588208 15% /netboot1
test29#
Updated by sepherosa over 13 years ago
On Mon, Apr 11, 2011 at 4:46 PM, Sepherosa Ziehau <sepherosa@gmail.com> wrote:
On Wed, Apr 6, 2011 at 2:16 PM, Matthew Dillon
<dillon@apollo.backplane.com> wrote::I have found a reliable way to trigger it:
:switch to single user mode
:mount -a
:cd /usr/src
:make installworld && make upgrade && reboot
:
:print_uptime() has not been called in my case.
:
:Best Regards,
:sepheHmm. If the uptime is not displayed this implies that the
vfs_unmountall() call is getting stuck.Yes, vfs_unmountall() blocks the rebooting.
With the following patch:
http://leaf.dragonflybsd.org/~sephe/umountall_print.diffIn single user mode:
make installworld && make upgrade && reboot
...
...
hammer callback start
<---------- ("nobusy callback done" is not logged, and reboot stops here)
The output of df(1):
http://leaf.dragonflybsd.org/~sephe/df.txt
One more thing, if I run several sync(8) after installworld &&
upgrade, then reboot does not seem to hang.
Best Regards,
sephe
Best Regards,
sephe
Updated by dillon over 13 years ago
I've figured it out. I looked at the core Francois Tigeot provided
(sorry if others were provided before, it was on my list!)... in
anycase, there is a bug in the HAMMER flusher which can cause it
to loose track of the flush sequence number which umountall can
trigger due to the extra flushes hammer does on unmount.
Commit e86903d84f840af38d1b452a6a6c624702373751 should fix it.
-Matt
Updated by sepherosa over 13 years ago
On Tue, Apr 12, 2011 at 11:30 PM, Edward O'Callaghan (via DragonFly
issue tracker) <sinknull@leaf.dragonflybsd.org> wrote:
Edward O'Callaghan <eocallaghan@auroraux.org> added the comment:
Confirm fix?
Yeah, it is fixed.
----------
status: chatting -> testing_____________________________________________
DragonFly issue tracker <bugs@lists.dragonflybsd.org>
<http://bugs.dragonflybsd.org/issue2039>
_____________________________________________