Project

General

Profile

Bug #3088

Kernel panic in syncache_add

Added by pa3k about 1 month ago. Updated 30 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
10/21/2017
Due date:
% Done:

0%


Description

Hello,
I'm using SuperMicro A1SAi-C2750 as small server with jails that was running fine with latest DragonFly BSD 4.8.1,
but since I upgraded to DragonFly BSD 5.0, I'm experiencing random crashes - always in syncache_add.

Tried also clean install and kernel rebuild,
server is actually running DragonFly v5.0.0.6.g0978b-RELEASE

24 hours of memtest is no problem for this server.
Crash summaries of latest server crashes are attached to this issue. I cannot find a patter how to reliably crash that server,
but core.txt.5 was right after reboot-after-panic while trying to copy core.txt.4 per scp from server.

Any help or hint how to fix or debug this problem is appreciated.

Also - should I be worried to see messages like this:
softdep_sync_metadata_bp(1): caught buf 0xffffff80af383f50 going away
on console?

Thanks in advance.

core.txt.3 (119 KB) pa3k, 10/21/2017 02:17 PM

a1sai.txt View - lshw (35 KB) pa3k, 10/21/2017 02:17 PM

dfly.5.0.jpg View - KVM screen Fatal trap 12 (47.7 KB) pa3k, 10/21/2017 02:32 PM

dfly.5.0.panic.jpg View - KVM first panic command (60.6 KB) pa3k, 10/21/2017 02:33 PM

dfly.5.0.trace.jpg View - KVM trace (71.6 KB) pa3k, 10/21/2017 02:33 PM

core.txt.4 (126 KB) pa3k, 10/21/2017 02:51 PM

core.txt.5 (129 KB) pa3k, 10/21/2017 02:51 PM

History

#1 Updated by dillon about 1 month ago

  • Status changed from New to In Progress

This appears to be a missed initialization in the kernel. I have pushed a fix to master and to the 5.0 release. Please try pulling the latest from the repo and rebuilding the kernel, and report back if the problem continues to occur. Basically there are two kmalloc's in netinet/tcp_syncache.c that did not specify M_ZERO when they should have.

You are getting some good dumps, so if my fix has not fixed it, we will try to debug the issue further by having you probe your dumps.

Thanks!

-Matt

#2 Updated by pa3k about 1 month ago

Thank you!

The server is now running compiled kernel DragonFly v5.0.0.7.gb21cd7-RELEASE for several hours without panic.
I will let the server run for few hours and if nothing happens, I will close this issue as resolved.

This "softdep_sync_metadata_bp(1): caught buf 0xffffff80af235c48 going away" is only informational and nothing to bother,
or is there some deeper issue hidden in?

#3 Updated by pa3k 30 days ago

  • Status changed from In Progress to Resolved

24 hours with kernel DragonFly v5.0.0.7.gb21cd7-RELEASE and server is still up and running without panic.
Closing as resolved.

Thank you.

Also available in: Atom PDF