Bug #2547

crashed while doing a dry run of pkg_rolling-replace

Added by phma about 1 year ago. Updated 12 months ago.

Status:NewStart date:04/13/2013
Priority:HighDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

I rebooted into a kernel compiled from sources I updated just a few days ago. I ran pkg_rolling-replace -nuv and it ran for a few hours, then crashed. The kernel dump is number 14 in my crash directory. The version is v3.5.0.25.g97861-DEVELOPMENT.

(kgdb) #0 _get_mycpu () at ./machine/thread.h:79
#1 md_dumpsys (di=0xc03e8b35)
at /usr/src/sys/platform/pc32/i386/dump_machdep.c:266
#2 0xc09646a0 in db_command_table ()
#3 0xc03e8b35 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:913
#4 0xc01968a3 in db_fncall (dummy1=-831973476, dummy2=0, dummy3=-1065827167,
dummy4=0xce691798 "(\361~\300\336L\212\300")
at /usr/src/sys/ddb/db_command.c:539
#5 0xc0196ca2 in db_command (aux_cmd_tablep_end=<optimized out>,
aux_cmd_tablep=<optimized out>, cmd_table=<optimized out>,
last_cmdp=0xc09f9318) at /usr/src/sys/ddb/db_command.c:401
#6 db_command_loop () at /usr/src/sys/ddb/db_command.c:467
#7 0xc019984b in db_trap (type=type@entry=12, code=code@entry=0)
at /usr/src/sys/ddb/db_trap.c:71
#8 0xc07903d0 in kdb_trap (type=type@entry=12, code=code@entry=0,
regs=regs@entry=0xce69190c)
at /usr/src/sys/platform/pc32/i386/db_interface.c:149
#9 0xc07c2602 in trap_fatal (frame=frame@entry=0xce69190c, eva=eva@entry=136)
at /usr/src/sys/platform/pc32/i386/trap.c:1107
#10 0xc07c27bd in trap_pfault (frame=frame@entry=0xce69190c, usermode=0,
usermode@entry=136, eva=eva@entry=136)
at /usr/src/sys/platform/pc32/i386/trap.c:1018
#11 0xc07c2dc3 in trap (frame=0xce69190c)
at /usr/src/sys/platform/pc32/i386/trap.c:695
#12 0xc0791c37 in calltrap ()
at /usr/src/sys/platform/pc32/i386/exception.s:787
#13 0xc0686b17 in fq_balance_self (tdio=0xc09edf00)
at /usr/src/sys/kern/dsched/fq/fq_core.c:351
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I also saw it freeze the first time I started a kde session after rebooting.

History

#1 Updated by phma about 1 year ago

The kernel crashed again at 1:08. I was doing nothing; that's during the nightly periodic job. Maybe there's a bug in Hammer. I rebooted into the previous kernel.

#2 Updated by phma about 1 year ago

It is now 1:29 and hammer (new world) is still running on the old kernel. The bug is therefore in the hammer code of the kernel.

#3 Updated by alexh about 1 year ago

The bug is most definitely not in HAMMER but rather in dsched or dsched
fq.

Cheers,
Alex

On 2013-04-16 06:30, Pierre Abbat via Redmine wrote:
> Issue #2547 has been updated by phma.
>
>
> It is now 1:29 and hammer (new world) is still running on the old
> kernel. The bug is therefore in the hammer code of the kernel.
> ----------------------------------------
> Bug #2547: crashed while doing a dry run of pkg_rolling-replace
> http://bugs.dragonflybsd.org/issues/2547
>
> Author: phma
> Status: New
> Priority: High
> Assignee:
> Category:
> Target version:
>
>
> I rebooted into a kernel compiled from sources I updated just a few
> days ago. I ran pkg_rolling-replace -nuv and it ran for a few hours,
> then crashed. The kernel dump is number 14 in my crash directory. The
> version is v3.5.0.25.g97861-DEVELOPMENT.
>
> (kgdb) #0 _get_mycpu () at ./machine/thread.h:79
> #1 md_dumpsys (di=0xc03e8b35)
> at /usr/src/sys/platform/pc32/i386/dump_machdep.c:266
> #2 0xc09646a0 in db_command_table ()
> #3 0xc03e8b35 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:913
> #4 0xc01968a3 in db_fncall (dummy1=-831973476, dummy2=0,
> dummy3=-1065827167,
> dummy4=0xce691798 "(\361~\300\336L\212\300")
> at /usr/src/sys/ddb/db_command.c:539
> #5 0xc0196ca2 in db_command (aux_cmd_tablep_end=<optimized out>,
> aux_cmd_tablep=<optimized out>, cmd_table=<optimized out>,
> last_cmdp=0xc09f9318) at /usr/src/sys/ddb/db_command.c:401
> #6 db_command_loop () at /usr/src/sys/ddb/db_command.c:467
> #7 0xc019984b in db_trap (type=type@entry=12, code=code@entry=0)
> at /usr/src/sys/ddb/db_trap.c:71
> #8 0xc07903d0 in kdb_trap (type=type@entry=12, code=code@entry=0,
> regs=regs@entry=0xce69190c)
> at /usr/src/sys/platform/pc32/i386/db_interface.c:149
> #9 0xc07c2602 in trap_fatal (frame=frame@entry=0xce69190c,
> eva=eva@entry=136)
> at /usr/src/sys/platform/pc32/i386/trap.c:1107
> #10 0xc07c27bd in trap_pfault (frame=frame@entry=0xce69190c,
> usermode=0,
> usermode@entry=136, eva=eva@entry=136)
> at /usr/src/sys/platform/pc32/i386/trap.c:1018
> #11 0xc07c2dc3 in trap (frame=0xce69190c)
> at /usr/src/sys/platform/pc32/i386/trap.c:695
> #12 0xc0791c37 in calltrap ()
> at /usr/src/sys/platform/pc32/i386/exception.s:787
> #13 0xc0686b17 in fq_balance_self (tdio=0xc09edf00)
> at /usr/src/sys/kern/dsched/fq/fq_core.c:351
> Backtrace stopped: previous frame inner to this frame (corrupt
> stack?)
>
> I also saw it freeze the first time I started a kde session after
> rebooting.

#4 Updated by phma about 1 year ago

Both disks are set to fq. Should I set them to something else and see what happens?

#5 Updated by vsrinivas about 1 year ago

Yea, you'll probably want to use the noop scheduler; it is the best-tested of all the dsched modules.

#6 Updated by phma 12 months ago

Confirmed: the bug is in fq. I rebooted it with both disks set to noop, and it's still up.

Also available in: Atom PDF