Bug #979

Failure-prone USB mass storage (SB600? msdosfs? CAM?)

Added by floid almost 7 years ago. Updated over 5 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

I'm still very behind when it comes to keeping track of development, so pardon
if this is known. I always need to do more testing, but some feedback will help
me narrow down the test cases.

Symptoms:

Using 1.12.0-RELEASE, copying from a FAT-formatted 2GB CF card in a reader with
a "Genesys" chipset hangs after the first ~82MB have been copied. A "hang" is
determined as iostat showing 0 throughput and cp not responding to ^C.

ehci.ko is *not* loaded, so ohci alone was involved here.

Hardware considerations:

SMP kernel (Athlon 64 x2)
SB600
The reader is part of a Mitsumi floppy combo device
(A slim conventional floppy and USB card reader crammed into one 3.5" box.)

Relevant portion of usbdevs -v:
Controller /dev/usb4:
addr 1: full speed, self powered, config 1, OHCI root hub(0x0000), ATI(0x0000),
rev 1.00
port 1 addr 2: full speed, power 500 mA, config 1, USB Reader(0x070e),
Genesys(0x05e3), rev 93.25
port 2 powered

...

The following from CAM was found in dmesg. I *believe* this was printed before
I rudely pulled the card, however I cannot be sure. (Have we considered
timestamping dmesg yet?) There was nothing else new in dmesg.

(da0:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0
(da0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI Status: Check Condition
(da0:umass-sim0:0:0:0): UNIT ATTENTION asc:28,0
(da0:umass-sim0:0:0:0): Not ready to ready change, medium may have changed
(da0:umass-sim0:0:0:0): Retrying Command (per Sense Data)

Similarly, cp has produced some odd output, but since the card was pulled and
the hung cp was left sitting for 24 hours before I got around to reporting this,
I'm not sure what happened here. :}

%cd /home/floid/Photos/
%cp /mnt/dcim/101olymp/* .
cp: ./pa091308.jpg: Bad address
cp: ./pa091307.jpg: Bad address
cp: /mnt/dcim/101olymp/pa091306.jpg: Input/output error
cp: /mnt/dcim/101olymp/pa091303.jpg: Input/output error
cp: /mnt/dcim/101olymp/pa091302.jpg: Cross-device link
^C

Adding to my confusion, the Linux (Ubuntu 7.10) machine I would attempt to read
the card with has a similar hardware configuration (another SB600, another card
reader that's also Genesys Logic-based) and its own intractable problems with
USB in general! On review, I see that Linux ("2.6.22-14-generic #1 SMP" i686)
made it exactly 32MB into the card before a majority of its USB support locked
up. Unfortunately that's been happening whenever anyone breathes near that
machine, and proprietary VMWare and fglrx modules are involved, so it'll be a
while before I can fsck or chkdsk the filesystem structure on the CF card itself!

(The media should be fine, since the camera has had no complaints.)

===

Should I be suspecting the filesystem, CAM (I've noticed Peter Avalos's work on
CAM locking but haven't tried it yet), or the basic hardware support?

History

#1 Updated by dillon almost 7 years ago

:Relevant portion of usbdevs -v:
:Controller /dev/usb4:
:addr 1: full speed, self powered, config 1, OHCI root hub(0x0000), ATI(0x00=
:00),
:rev 1.00
:...
:
:Should I be suspecting the filesystem, CAM (I've noticed Peter Avalos's wor=
:k on
:CAM locking but haven't tried it yet), or the basic hardware support?

Well, if Linux is also blowing up on it then it is either going to be
the USB chipset on the motherboard, or the device plugged into or
otherwise connected to the USB.

If other USB mass-storage devices work (for example, if you plug in
a usb storage key and that works ok), then it is likely the device
plugged into the usb slot and not the usb chipset that is to blame.

If no mass storage devices work then it is likely an issue with the
usb chipset or driver.

-Matt
Matthew Dillon
<>

#2 Updated by floid almost 7 years ago

Thanks for the attention -- in the past few minutes I decided to make sure
Ubuntu had an issue open, and found some information that may be promising or
headache-inducing.

"Slow" should really be removed from the subject here since I typed it before I
remembered I was running OHCI only. Of course, 0 bytes/sec when it fails does
feel slow!

Matt said,
> Well, if Linux is also blowing up on it then it is either going to be
> the USB chipset on the motherboard, or the device plugged into or
> otherwise connected to the USB.

Naturally, but Linux support has been *so* flaky that I had no reason to believe
the DragonFly and Linux bugs were related.. until now, see below.

> If other USB mass-storage devices work (for example, if you plug in
> a usb storage key and that works ok), then it is likely the device
> plugged into the usb slot and not the usb chipset that is to blame.

I think I've noticed this before with other devices but that might've been well
before 1.12. I'm doing the right thing and testing with a USB stick now. (So
far it's being well-behaved with one large file; I'm still using OHCI only for
determinism. I'll try generating the equivalent of a photo card's dcim
directory. I'll also pull out an external card reader that I forgot about and
see how that fares.)

> If no mass storage devices work then it is likely an issue with the
> usb chipset or driver.

Apparently it took a while for SB600 ownership to reach critical mass.
Checking Ubuntu found their issue, opened this month:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/198619
...which is bursting with useful links to LKML and Kernel.org discussion and
instrumentation, including:
http://bugzilla.kernel.org/show_bug.cgi?id=8692
http://lkml.org/lkml/2008/2/19/546
(It's easiest to refer to the Launchpad entry, which provides some chronology.
It'll take a while to check that all the patches referenced made it back into
the kernel.org bugzilla.)

Apparently there are quirks or surprises with IAA behavior/IAAD register
handling ("Interrupt on Asynchronous Advance Doorbell") on the SB600, possibly
on the SB700. It also seems there are some similar (but presumably different)
IAA/IAAD quirks with other vendors' USB host controllers.

I'm still working to understand exactly what they found and whether it even
applies outside EHCI. In the meantime, I'll do the tests mentioned above. I do
have some other CF media to try, and some NEC-based USB controllers I should
probably drop into these boxes to see if I can copy and preserve the cards' data
in the meantime, now that there's reason to suspect the controller.

#3 Updated by floid almost 7 years ago

Short update:

The USB stick (PNY Attache 2GB) seems to be working fine with 1.12. Perhaps I
wasn't stressing it much because the files were immediately copied from the HD
and back and a large portion may've been retained in cache.

Using a second card reader ("Zio Corporation," detected with a generic string
per below):

Controller /dev/usb2:
addr 1: full speed, self powered, config 1, OHCI root hub(0x0000), ATI(0x0000),
rev 1.00
port 1 addr 2: low speed, power 100 mA, config 1, PS2/USB Browser Combo
Mouse(0x0011), Cypress Sem(0x05fe), rev 0.00
port 2 addr 3: full speed, power 500 mA, config 1, USB2.0 Card Reader(0x0001),
Storage Device(0x0dda), rev 1.6f

... the 2GB CF card (which may or may not have a corrupt FS at this point)
presented with a severely truncated directory listing. I believe it was trying
fsck_msdos that produced the following in dmesg:

contigmalloc_map: failed in index < 0 case!
umass1: BBB reset failed, STALLED
umass1: BBB bulk-in clear stall failed, STALLED
umass1: BBB bulk-out clear stall failed, STALLED
contigmalloc_map: failed in index < 0 case!
umass1: BBB reset failed, STALLED
umass1: BBB bulk-in clear stall failed, STALLED
umass1: BBB bulk-out clear stall failed, STALLED
contigmalloc_map: failed in index < 0 case!
umass1: BBB reset failed, STALLED
umass1: BBB bulk-in clear stall failed, STALLED
umass1: BBB bulk-out clear stall failed, STALLED
contigmalloc_map: failed in index < 0 case!
umass1: BBB reset failed, STALLED
umass1: BBB bulk-in clear stall failed, STALLED
umass1: BBB bulk-out clear stall failed, STALLED
contigmalloc_map: failed in index < 0 case!
umass1: BBB reset failed, STALLED
umass1: BBB bulk-in clear stall failed, STALLED
umass1: BBB bulk-out clear stall failed, STALLED
umass1: BBB reset failed, STALLED
umass1: BBB bulk-in clear stall failed, STALLED
umass1: BBB bulk-out clear stall failed, STALLED
(da2:umass-sim1:1:0:0): Synchronize cache failed, status == 0x4, scsi status == 0x0
umass1: BBB reset failed, STALLED
umass1: BBB bulk-in clear stall failed, IOERROR
umass1: BBB bulk-out clear stall failed, TIMEOUT

With the Zio reader, ls of a known-good 1GB CF was fine, while fsck_msdos failed
with that exact same output in dmesg.

Attempting to copy the roughly 1GB of large JPEGs from that card has proceeded
for perhaps a minute, hung for approximately 1 minute, and now to my surprise is
proceeding again. That pattern now seems to be repeating, so I'm logging iostat
with timestamps to try to get some sense of it.

#4 Updated by tuxillo over 5 years ago

Can you please try with latest ISO snapshot? There was some work done on the USB
stack and many commits to CAM.

Also available in: Atom PDF