Bug #3088: Kernel panic in syncache_add - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #3088

closed

Kernel panic in syncache_add

Added by pa3k almost 8 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

10/21/2017

Due date:

% Done:

Estimated time:

Description

Hello,
I'm using SuperMicro A1SAi-C2750 as small server with jails that was running fine with latest DragonFly BSD 4.8.1,
but since I upgraded to DragonFly BSD 5.0, I'm experiencing random crashes - always in syncache_add.

Tried also clean install and kernel rebuild,
server is actually running DragonFly v5.0.0.6.g0978b-RELEASE

24 hours of memtest is no problem for this server.
Crash summaries of latest server crashes are attached to this issue. I cannot find a patter how to reliably crash that server,
but core.txt.5 was right after reboot-after-panic while trying to copy core.txt.4 per scp from server.

Any help or hint how to fix or debug this problem is appreciated.

Also - should I be worried to see messages like this:
softdep_sync_metadata_bp(1): caught buf 0xffffff80af383f50 going away
on console?

Thanks in advance.

Files

Download all files

core.txt.3 (119 KB) core.txt.3		pa3k, 10/21/2017 02:17 PM
a1sai.txt (35 KB) a1sai.txt	lshw	pa3k, 10/21/2017 02:17 PM
dfly.5.0.jpg (47.7 KB) dfly.5.0.jpg	KVM screen Fatal trap 12	pa3k, 10/21/2017 02:32 PM
dfly.5.0.panic.jpg (60.6 KB) dfly.5.0.panic.jpg	KVM first panic command	pa3k, 10/21/2017 02:33 PM
dfly.5.0.trace.jpg (71.6 KB) dfly.5.0.trace.jpg	KVM trace	pa3k, 10/21/2017 02:33 PM
core.txt.4 (126 KB) core.txt.4		pa3k, 10/21/2017 02:51 PM
core.txt.5 (129 KB) core.txt.5		pa3k, 10/21/2017 02:51 PM

Actions

Copy link

Updated by dillon almost 8 years ago

Status changed from New to In Progress

This appears to be a missed initialization in the kernel. I have pushed a fix to master and to the 5.0 release. Please try pulling the latest from the repo and rebuilding the kernel, and report back if the problem continues to occur. Basically there are two kmalloc's in netinet/tcp_syncache.c that did not specify M_ZERO when they should have.

You are getting some good dumps, so if my fix has not fixed it, we will try to debug the issue further by having you probe your dumps.

Thanks!

-Matt

Actions

Copy link

Updated by pa3k almost 8 years ago

Thank you!

The server is now running compiled kernel DragonFly v5.0.0.7.gb21cd7-RELEASE for several hours without panic.
I will let the server run for few hours and if nothing happens, I will close this issue as resolved.

This "softdep_sync_metadata_bp(1): caught buf 0xffffff80af235c48 going away" is only informational and nothing to bother,
or is there some deeper issue hidden in?

Actions

Copy link