Project

General

Profile

Actions

Bug #3237

closed

msdosfs: can't properly read files longer than couple of kilobytes

Added by piecuch almost 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Other
Target version:
Start date:
05/03/2020
Due date:
% Done:

0%

Estimated time:

Description

Reading a file from msdosfs reports valid data within the first couple of kilobytes, everything after that is only NULL bytes (or it might be some data from a different sector which is currently zeroed).

I have tested this on the following configurations:
write on Linux - corrupted data on dfly, correct data on Linux
write on dfly - corrupted data on dfly, correct data on Linux

I am running latest version from master.

I will see if this bug can be reproduced on freebsd and try to fix that/port stuff from freebsd. Hints welcome.


Files

first-ununiq.svg (1.01 MB) first-ununiq.svg BytesPerSec=512 SecPerClust=2 piecuch, 06/03/2020 02:38 PM
second-ununiq.svg (1.03 MB) second-ununiq.svg BytesPerSec=512 SecPerClust=4 piecuch, 06/03/2020 02:38 PM
third-ununiq.svg (1010 KB) third-ununiq.svg BytesPerSec=2048 SecPerClust=1 piecuch, 06/03/2020 02:38 PM
0001-msdosfs-fix-3237-where-reading-the-last-sector-of-a-.patch (1.49 KB) 0001-msdosfs-fix-3237-where-reading-the-last-sector-of-a-.patch piecuch, 01/27/2021 03:56 PM
Actions #1

Updated by piecuch almost 4 years ago

FreeBSD can correctly read data written on both Linux and DFly, DFly cannot properly read files written from FreeBSD.

I forgot to mention that you need a reboot (or maybe some kind of cache flush) for the bug to surface.

Actions #2

Updated by tkusumi almost 4 years ago

Looks like inmemory buffer for data is somehow not landing ondisk, but can't reproduce it myself. Try fsck_msdosfs. I would presume fsck won't complain, meaning FAT clusters are in consistent state.

If this consistently (100% always) reproduces on FAT32 with a certain file size,

1) If you want to further investigate, you could try old kernels (probably before 2019 September) and try to see or bisect if it reproduces.
2) I'd recommend to write the exact repro here, e.g. newfs_msdos output (which contains FAT params), the exact command to write to a file, the exact command to find out a file with \0 data beyond which offset, etc.

Updated by piecuch almost 4 years ago

fsck doesn't complain.

inmemory buffer for data is somehow not landing ondisk

I don't think this is true - the data written on DFly looks fine on Linux & FreeBSD. Data written on Linux cannot be read correctly on DFly.

1) Will try to bisect it

2) I've tested it the following way (using sh)

```
  1. newfs_msdos -u 1 -F 32 -c 2 /dev/vkd0s0d
    /dev/vkd0s0d: 1016030 sectors in 508015 FAT32 clusters (1024 bytes/cluster)
    BytesPerSec=512 SecPerClust=2 ResSectors=32 FATs=2 Media=0xf0 SecPerTrack=1 Heads=1 HiddenSecs=0 HugeSectors=1024000 FATsecs=3969 RootCluster=2 FSInfo=1 Backup=2
  2. mount /dev/vkd0s0d /test
  3. cd /test
  4. for i in $(seq 10000); do yes | head -n$i | tr -d '\n' > $i; done
  5. rm -r ../gut; mkdir ../gut # UFS partition
  6. reboot
  7. mount /dev/vkd0s0d /test
  8. cd /test
  9. for i in $(seq 10000); do cat $i | tr -d '\0' > ../gut/$i; done;
  10. cd ../gut/
  11. for i in $(seq 10000); do printf "expected %s, actual %s\n" $i $(cat $i | wc -c); done > all
  12. uniq -c -f 2 all > first-test
  1. newfs_msdos -u 1 -F 32 -c 4 /dev/vkd0s0d
    /dev/vkd0s0d: 1019980 sectors in 254995 FAT32 clusters (2048 bytes/cluster)
    BytesPerSec=512 SecPerClust=4 ResSectors=32 FATs=2 Media=0xf0 SecPerTrack=1 Heads=1 HiddenSecs=0 HugeSectors=1024000 FATsecs=1993 RootCluster=2 FSInfo=1 Backup=2
  2. mount /dev/vkd0s0d /test
  3. cd /test
  4. for i in $(seq 10000); do yes | head -n$i | tr -d '\n' > $i; done
  5. rm -r ../gut; mkdir ../gut #UFS
  6. reboot
  7. mount /dev/vkd0s0d /test
  8. cd /test
  9. for i in $(seq 10000); do cat $i | tr -d '\0' > ../gut/$i; done;
  10. cd ../gut/
  11. for i in $(seq 10000); do printf "expected %s, actual %s\n" $i $(cat $i | wc -c); done > all
  12. uniq -c -f 2 all > second-test
  1. newfs_msdos -F 32 -S 2048 -c 1 -u 1 /dev/vkd0s0d
    /dev/vkd0s0d: 254994 sectors in 254994 FAT32 clusters (2048 bytes/cluster)
    BytesPerSec=2048 SecPerClust=1 ResSectors=8 FATs=2 Media=0xf0 SecPerTrack=1 Heads=1 HiddenSecs=0 HugeSectors=256000 FATsecs=499 RootCluster=2 FSInfo=1 Backup=2
  2. mount /dev/vkd0s0d /test
  3. cd /test
  4. for i in $(seq 10000); do yes | head -n$i | tr -d '\n' > $i; done
  5. rm -r ../gut; mkdir ../gut #UFS
  6. reboot
  7. mount /dev/vkd0s0d /test
  8. cd /test
  9. for i in $(seq 10000); do cat $i | tr -d '\0' > ../gut/$i; done;
  10. cd ../gut/
  11. for i in $(seq 10000); do printf "expected %s, actual %s\n" $i $(cat $i | wc -c); done > all
    ```

You can find the plotted results in the attachments.

Actions #4

Updated by tkusumi almost 4 years ago

I don't think this is true - the data written on DFly looks fine

then explain what this means.

"write on dfly - corrupted data on dfly,"

Actions #5

Updated by piecuch almost 4 years ago

then explain what this means.

"write on dfly - corrupted data on dfly,"

Apologies for vague messages. This means that data written on dfly appears to be corrupted on dfly. Reading this file from other OS (Linux, FreeBSD) returns correct output.

I tried bisecting the bug but I haven't learned much:

- 5.4.0 - the bug still occurs - this is the oldest vkernel I was able to build without much effort and the only one I ran under vkernel.

The following images were downloaded from dragonflybsd.org and tested without virtualization:
- 5.0.0 - the bug still occurs
- 4.4.0 - the bug still occurs
- 4.0.2 - the bug still occurs

Anything older than that wouldn't boot on my machine so I had to use qemu for testing. I tried 3.6.1, 2.4.1 and 2.6.3 but I couldn't make the OS get to the filesystem because of various reasons (panics, newfs_msdos crashes, the OS won't detect drives) which made me question whether these support qemu.

If you think there's value in trying harder with older releases, I'm happy to do it.

Actions #7

Updated by dillon about 3 years ago

  • Status changed from New to Resolved
  • Assignee set to dillon

Going to push this. There is clearly something more involved here, probably related to the bmap code or to assumptions made by this older filesystem code with regards to how buffers crossing the file EOF are laid out. But for now lets just go with this work-around.

-Matt

Actions

Also available in: Atom PDF