Project

General

Profile

Bug #2819

Random micro system freezes after a week of uptime

Added by ftigeot about 2 years ago. Updated almost 2 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Kernel
Target version:
-
Start date:
05/21/2015
Due date:
% Done:

50%


Description

On a file server, the system freeze for a few seconds to more than a minute after aproximately a week of uptime.
The longer the machine stays up, the worse the freezes become. The micro-freezes happen more often and become longer.

It is a complete kernel freeze: characters typed on the console stop appearing on the screen when it happens.
Characters don't stop being displayed when the operating is stopped and the kernel debugger is active, which indicates the problem is not of a hardware origin.

Hardware specs:

- Xeon E5-2620, 64 GB RAM
- Areca RAID controller
- 2x 500GB system disks (RAID 1)
- 11x 2 GB data disks (RAID 5)
- 1x 512 GB SSD (JBOD), used entirely for swap
- 10Gb Intel X540 ethernet adapter

Software configuration:
- swapcache enabled, up to 85% of the available swap size
- deduplication enabled on the data volume

The data volume is used for two things:
- protein sequences
- a rsnapshot backup directory for various servers

History

#1 Updated by ftigeot about 2 years ago

vmstat -m output when the machine suffers from micro-freezes:

Memory statistics by type Type Kern
Type InUse MemUse HighUse Limit Requests Limit Limit
HAMMER-inodes43004623762905K 0K 134203388K 35443885 0 0
HAMMER-others 233194 40792K 0K 6498816K 525146675 0 0
tmpfs name zone 0 0K 0K 6498816K 5197 0 0
tmpfs dirent 56 5K 0K 6498816K 56 0 0
tmpfs node 64 18K 0K 6498816K 64 0 0
HAMMER-inodes 1014 888K 0K 134203388K 176625 0 0
HAMMER-others 1799 357K 0K 6498816K 2181892 0 0
pci_link 16 2K 0K 6498816K 16 0 0
acpitask 0 0K 0K 6498816K 6 0 0
acpica 42238 1467K 0K 6498816K 245264 0 0
acpidev 245 10K 0K 6498816K 245 0 0
acpisem 59 3K 0K 6498816K 59 0 0
eventhandler 35 2K 0K 6498816K 35 0 0
disk 6 1K 0K 6498816K 6 0 0
atkbddev 2 1K 0K 6498816K 2 0 0
bus 1512 191K 0K 6498816K 31013 0 0
callout 12 49152K 0K 6498816K 12 0 0
nexusdev 7 1K 0K 6498816K 7 0 0
sysctl 0 0K 0K 6498816K 1043131 0 0
sysctloid 7003 229K 0K 6498816K 7155 0 0
tslpque 11 704K 0K 6498816K 11 0 0
syscons 41 167K 0K 6498816K 41 0 0
aesni_data 1 1K 0K 6498816K 1 0 0
dsched 8366 792K 0K 6498816K 8366 0 0
lwkt message 22 21K 0K 6498816K 5574 0 0
thread 233 321K 0K 6498816K 235 0 0
scsi_da 0 0K 0K 6498816K 9 0 0
memdesc 1 4K 0K 6498816K 1 0 0
MPipe Array 2 3K 0K 6498816K 2 0 0
cache 133175 13201K 0K 6498816K 133267 0 0
devbuf 2044 2583K 0K 6498816K 2060 0 0
temp 501 129K 0K 6498816K 958039631 0 0
ip6ndp 12 1K 0K 6498816K 15 0 0
CAM queue 32 8K 0K 6498816K 1549 0 0
xform 0 0K 0K 6498816K 18666 0 0
crypto 1 1K 0K 6498816K 1 0 0
propstng 1383 55K 0K 6498816K 1383 0 0
prop string 1375 11K 0K 6498816K 1457 0 0
propnmbr 1636 90K 0K 6498816K 1636 0 0
pdict16 20 2K 0K 6498816K 20 0 0
propdict 681 128K 0K 6498816K 681 0 0
prop dictionary 677 170K 0K 6498816K 715 0 0
kbdmux 6 8K 0K 6498816K 6 0 0
isadev 20 2K 0K 6498816K 20 0 0
ZONE 1 4K 0K 6498816K 1 0 0
uidinfo 4 65K 0K 6498816K 183908 0 0
cred 14 2K 0K 6498816K 977404 0 0
pgrp 26 4K 0K 6498816K 194639 0 0
session 23 2K 0K 6498816K 194593 0 0
vmspace 144 126K 0K 6498816K 154 0 0
proc 41 71K 0K 6498816K 1175580 0 0
lwp 42 27K 0K 6498816K 981838 0 0
subproc 84 122K 0K 6498816K 983962 0 0
tmpfs mount 1 1K 0K 6498816K 1 0 0
HAMMER-mount 2 136K 0K 6498816K 2 0 0
objcache 64 43K 0K 6498816K 64 0 0
devfs 3710 592K 0K 6498816K 4185 0 0
objcache magazine 11459 11149K 0K 6498816K 11459 0 0
UFS dirhash 21 16K 0K 6498816K 21 0 0
UFS mount 3 5K 0K 6498816K 3 0 0
UFS ihash 1 16384K 0K 6498816K 1 0 0
FFS node 1184 370K 0K 134203388K 1193 0 0
pagedep 1 8192K 0K 6498816K 1 0 0
inodedep 1 65536K 0K 6498816K 1 0 0
newblk 1 1K 0K 6498816K 1 0 0
p1003.1b 1 1K 0K 6498816K 1 0 0
lockf 7 1K 0K 6498816K 3856 0 0
atexit 2 1K 0K 6498816K 2 0 0
proc-args 34 2K 0K 6498816K 641794 0 0
exec-args 20 5200K 0K 6498816K 20 0 0
kqueue 40 5K 0K 6498816K 39853084 0 0
kenv 37 6K 0K 6498816K 37 0 0
file desc 41 62K 0K 6498816K 985898 0 0
file 99 13K 0K 6498816K 83268868 0 0
sigio 1 1K 0K 6498816K 1 0 0
NFS daemon 5 18K 0K 6498816K 5 0 0
NFSV3 srvdesc 0 0K 0K 6498816K 53078720 0 0
NFS hash 1 65536K 0K 6498816K 1 0 0
NFS srvsock 2 2K 0K 6498816K 39 0 0
ip6_moptions 1 1K 0K 6498816K 1 0 0
syncache 8 96K 0K 6498816K 2652 0 0
tcptemp 25 2K 0K 6498816K 25 0 0
sblk 2 1K 0K 6498816K 34712 0 0
tseg_qent 0 0K 0K 6498816K 664 0 0
ipq 250 12K 0K 6498816K 250 0 0
kld 127 8K 0K 6498816K 134 0 0
in_multi 25 2K 0K 6498816K 25 0 0
igmp 1 1K 0K 6498816K 1 0 0
module 338 32K 0K 6498816K 338 0 0
routetbl 927 143K 0K 6498816K 10772 0 0
varsym 258 10K 0K 6498816K 272 0 0
faith 1 1K 0K 6498816K 1 0 0
CAM SIM 8 2K 0K 6498816K 8 0 0
CAM periph 13 2K 0K 6498816K 231 0 0
ISOFS mount 1 65536K 0K 6498816K 1 0 0
vn_softc 4 11K 0K 6498816K 4 0 0
clone 6 24K 0K 6498816K 6 0 0
ifaddr 109 87K 0K 6498816K 109 0 0
ether_multi 87 5K 0K 6498816K 87 0 0
ifnet 1 1K 0K 6498816K 8 0 0
BPF 8 1K 0K 6498816K 8 0 0
MSDOSFS mount 1 65536K 0K 6498816K 1 0 0
NULLFS mount 6 4K 0K 6498816K 6 0 0
vnodes43027071747975K 0K 134203388K 35629122 0 0
Export Host 1 1K 0K 6498816K 1 0 0
vnodeops 22 13K 0K 6498816K 22 0 0
nameibufs 44 44K 0K 6498816K 44 0 0
mount 13 13K 0K 6498816K 16 0 0
cluster_save 0 0K 0K 6498816K 194908 0 0
vfscache413619643672964K 0K 6498816K1424298426 0 0
BIO buffer 2 3K 0K 6498816K 23 0 0
unpcb 14 3K 0K 6498816K 13786 0 0
CAM dev queue 8 1K 0K 6498816K 8 0 0
socket 37 26K 0K 6498816K 203194 0 0
soname 5 1K 0K 6498816K 216667 0 0
pcb 148 163K 0K 6498816K 272388 0 0
tag 0 0K 0K 6498816K 67129 0 0
mbuf 105755 52878K 0K 6498816K 318727634 0 0
mbufcl 44054 129120K 0K 6498816K 44054 0 0
mclmeta 44054 689K 0K 6498816K 44054 0 0
ptys 257 129K 0K 6498816K 259 0 0
ttys 879 113K 0K 6498816K 3257 0 0
shm 1 40K 0K 6498816K 1 0 0
CAM XPT 353 216K 0K 6498816K 1551 0 0
sem 1 144K 0K 6498816K 1 0 0
msg 4 27K 0K 6498816K 4 0 0
MD disk 2 2K 0K 6498816K 2 0 0
Unitno 1 1K 0K 6498816K 1 0 0
rman 168 18K 0K 6498816K 534 0 0
pipe 446 104K 0K 6498816K 1072362 0 0
ioctlops 0 0K 0K 6498816K 195930 0 0
taskqueue 28 2K 0K 6498816K 28 0 0
sbuf 0 0K 0K 6498816K 24 0 0
SWAP 2 131077K 0K 6498816K 2 0 0
kobj 234 527K 0K 6498816K 234 0 0

Memory Totals: In Use Free Requests
9915877K 0K 3486074011

#2 Updated by ftigeot about 2 years ago

The bug reporting tool sadly doesn't preserve whitespace.
The following lines are interesting and way out of line compared to other values:

Type InUse MemUse HighUse Limit Requests Limit Limit

HAMMER-inodes 43004623762905K 0K 134203388K 35443885 0 0

vnodes 43027071747975K 0K 134203388K 35629122 0 0

vfscache 413619643672964K 0K 6498816K 1424298426 0 0

#3 Updated by ftigeot about 2 years ago

The /data filesystem contains about 5 million files.
95% of them are hard links or subdirectories in the rsnapshot directory.

#4 Updated by ftigeot about 2 years ago

  • Status changed from New to In Progress

The same system with the kern.maxvnodes sysctl set to 300,000 after a fresh reboot doesn't suffer from the micro-freezes after 15 days of uptime.

#5 Updated by ftigeot about 2 years ago

  • % Done changed from 0 to 50

#6 Updated by ftigeot about 2 years ago

Another data point: when the problems happened, atime was enabled on /data

The trouble-free >2 weeks uptime is with noatime=on on /data

#7 Updated by dillon almost 2 years ago

  • Category set to Kernel
  • Assignee set to dillon

test redmine

Also available in: Atom PDF