Bug #1240
closedsata dvdrom leads to interrupt livelock
0%
Description
without natapicam, system boots fine. with natapicam i get a livelock and the
following log fragment (transcribed):
(livelock irq 7, that's where atapci1 [= all my drives, including acd0] is)
waiting for the following drive to finish configuring func=0xc0132e3e
...
acd0: FAILURE - INQUIRY timed out
ata_interrupt: early interrupt
(probe0:ata3:0:0:0): Command timed out
seems that something is locking out the ata interrupt, leading to a livelock.
at some point ata decides that the command timed out and seemingly also releases
the culprit. ata_interrupt then comes through, finding an empty queue,
complaining. after that the cam receives the timeout.
Updated by corecode almost 16 years ago
Actually it turns out that I'm getting an interrupt storm also without
natapicam. Happens when I access acd0.
Updated by steve almost 16 years ago
On Fri, 23 Jan 2009 19:05:08 +0000
"Simon 'corecode' Schubert \(via DragonFly issue tracker\)"
<sinknull@crater.dragonflybsd.org> wrote:
Does it go away if you turn of DMA with hw.ata.atapi_dma=0
in /boot/loader.conf ?
Updated by corecode almost 16 years ago
nope, doesn't make a difference. I set it in the running system, though.
Updated by steve almost 16 years ago
On Fri, 23 Jan 2009 20:53:01 +0000
"Simon 'corecode' Schubert \(via DragonFly issue tracker\)"
<sinknull@crater.dragonflybsd.org> wrote:
That doesn't seem to work the same way - set it in loader.conf and
reboot to be sure. I always get an interrupt storm unless I boot with dma
disabled in loader.conf.
Updated by corecode almost 16 years ago
I've tracked this down somewhat - if I access the video data (not just the
directory structure), it seems that the drive wants to report an error: the ERR
bit (0) is set in the status register. However, in the busmaster status
register, the interrupt AND the active bit are set.
As a result, ata-pci.c/ata_pci_status():475 will return 0, leading to an
unserviced interrupt. repeat. I believe we always need to pass through the
interrupt routine if the ERR bit is set, no matter whether the busmaster active
bit is still set or not.
Updated by dillon almost 16 years ago
:Simon 'corecode' Schubert <corecode@fs.ei.tum.de> added the comment:
:
:I've tracked this down somewhat - if I access the video data (not just the
:directory structure), it seems that the drive wants to report an error: th=
:e ERR
:bit (0) is set in the status register. However, in the busmaster status
:register, the interrupt AND the active bit are set.
:
:As a result, ata-pci.c/ata_pci_status():475 will return 0, leading to an
:unserviced interrupt. repeat. I believe we always need to pass through th=
:e
:interrupt routine if the ERR bit is set, no matter whether the busmaster ac=
:tive
:bit is still set or not.
Were you able to to test your fix? I don't have anything that livelocks
that I can test with.
Those BMSTAT bits aren't part of the standard at all (that I know of),
but they are the only way to be able to tell if an ATA device is
actually the source of an interrupt when multiple devices share the
same interrupt.
On the face of it it does seem as though we should be able to return
1 if ATA_BMSTAT_INTERRUPT is set and either ATA_BMSTAT_ERROR is
set or ATA_BMSTAT_ACTIVE is clear. It is unclear to me whether
we can safely clear the ACTIVE bit, though, which is what that
next ATA_IDX_OUTB() will do if we allow the ERROR + INTERRUPT case
through. Then again, maybe we have to. I just don't know.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by corecode almost 16 years ago
Yes, that's the busmaster interrupt status. The problem is that the drive
doesn't set the busmaster error, but instead only the ata status error.
You can't clear ACTIVE, it is readonly. We'd have to abort the busmaster
transaction, and that's where I don't know what to do. I'll have to try
and see.
cheers
simon