sata dvdrom leads to interrupt livelock
without natapicam, system boots fine. with natapicam i get a livelock and the
following log fragment (transcribed):
(livelock irq 7, that's where atapci1 [= all my drives, including acd0] is)
waiting for the following drive to finish configuring func=0xc0132e3e
acd0: FAILURE - INQUIRY timed out
ata_interrupt: early interrupt
(probe0:ata3:0:0:0): Command timed out
seems that something is locking out the ata interrupt, leading to a livelock.
at some point ata decides that the command timed out and seemingly also releases
the culprit. ata_interrupt then comes through, finding an empty queue,
complaining. after that the cam receives the timeout.
#4 Updated by steve over 9 years ago
On Fri, 23 Jan 2009 20:53:01 +0000
"Simon 'corecode' Schubert \(via DragonFly issue tracker\)"
That doesn't seem to work the same way - set it in loader.conf and
reboot to be sure. I always get an interrupt storm unless I boot with dma
disabled in loader.conf.
#5 Updated by corecode over 9 years ago
I've tracked this down somewhat - if I access the video data (not just the
directory structure), it seems that the drive wants to report an error: the ERR
bit (0) is set in the status register. However, in the busmaster status
register, the interrupt AND the active bit are set.
As a result, ata-pci.c/ata_pci_status():475 will return 0, leading to an
unserviced interrupt. repeat. I believe we always need to pass through the
interrupt routine if the ERR bit is set, no matter whether the busmaster active
bit is still set or not.
#6 Updated by dillon over 9 years ago
:Simon 'corecode' Schubert <email@example.com> added the comment:
:I've tracked this down somewhat - if I access the video data (not just the
:directory structure), it seems that the drive wants to report an error: th=
:bit (0) is set in the status register. However, in the busmaster status
:register, the interrupt AND the active bit are set.
:As a result, ata-pci.c/ata_pci_status():475 will return 0, leading to an
:unserviced interrupt. repeat. I believe we always need to pass through th=
:interrupt routine if the ERR bit is set, no matter whether the busmaster ac=
:bit is still set or not.
Were you able to to test your fix? I don't have anything that livelocks
that I can test with.
Those BMSTAT bits aren't part of the standard at all (that I know of),
but they are the only way to be able to tell if an ATA device is
actually the source of an interrupt when multiple devices share the
On the face of it it does seem as though we should be able to return
1 if ATA_BMSTAT_INTERRUPT is set and either ATA_BMSTAT_ERROR is
set or ATA_BMSTAT_ACTIVE is clear. It is unclear to me whether
we can safely clear the ACTIVE bit, though, which is what that
next ATA_IDX_OUTB() will do if we allow the ERROR + INTERRUPT case
through. Then again, maybe we have to. I just don't know.
#7 Updated by corecode over 9 years ago
Yes, that's the busmaster interrupt status. The problem is that the drive
doesn't set the busmaster error, but instead only the ata status error.
You can't clear ACTIVE, it is readonly. We'd have to abort the busmaster
transaction, and that's where I don't know what to do. I'll have to try