HAMMER data corruption with rebranded LSI RAID adapters
Two different machines running DragonFly 3.6.x had the kernel report HAMMER CRC32 data corruption and then panic.
After a reboot, the kernel couldn't repair and mount the HAMMER volumes where these CRC32 errors occurred due to massive amounts of data corruption.
The systems had these elements in common:
- DragonFly 3.6.x
- Dell rack servers
- HAMMER filesystem on a hardware RAID volume managed by the mfi(4) LSI MegaRAID SAS driver
The two different machines were apparently using the same kind of rebranded LSI RAID adapter.
PCI id from one of the cards:
mfi0@pci0:3:0:0: class=0x010400 card=0x1f341028 chip=0x005b1000 rev=0x05 hdr=0x00
#1 Updated by swildner about 2 years ago
We don't know if it is really related to the mfi(4) driver. But it might be a theory worth checking out.
I have committed a new driver from FreeBSD for Thunderbolt, Invader and Fury series adapters, mrsas(4):
If you want to try it out, make sure it is loaded (by putting mrsas_load=yes into your loader.conf or adding "device mrsas" to the kernel configuration you are using and either disable loading or compiling in of mfi(4) entirely or setting hw.mfi.mrsas_enable=1 in /boot/loader.conf, which will allow mrsas(4) to be taken for these adapters.
Note that the disk device nodes for mrsas(4) follow CAM nomenclature and are /dev/da?, while the mfi(4) driver uses /dev/mfid?. It is recommended to take the appropriate nodes in /dev/serno to allow for a smooth transition.
Also note that there is no mfiutil(8) like tool for mrsas(4).
If you need this driver on 3.8, do 'git cherry-pick 6d743f0468a9bd40d1cedc939569228864d0614f' in your 3.8 branch.
#5 Updated by ftigeot about 2 years ago
There was an interesting discussion thread about data corruption with a particular LSI adapter and the mfi(4) driver on FreeBSD in March.
Some of the most pertinent individual mails:
#6 Updated by ftigeot almost 2 years ago
A test server running with the mrsas(4) driver is still running perfectly after having processed terabytes of database imports and nfs traffic.
It is likely the corruption seen with Thunderbolt LSI RAID adapters is specific to the mfi(4) driver at this point.