Project

General

Profile

Actions

Bug #1240

closed

sata dvdrom leads to interrupt livelock

Added by corecode about 15 years ago. Updated about 15 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

without natapicam, system boots fine. with natapicam i get a livelock and the
following log fragment (transcribed):

(livelock irq 7, that's where atapci1 [= all my drives, including acd0] is)

waiting for the following drive to finish configuring func=0xc0132e3e
...
acd0: FAILURE - INQUIRY timed out
ata_interrupt: early interrupt
(probe0:ata3:0:0:0): Command timed out

seems that something is locking out the ata interrupt, leading to a livelock.
at some point ata decides that the command timed out and seemingly also releases
the culprit. ata_interrupt then comes through, finding an empty queue,
complaining. after that the cam receives the timeout.

Actions #1

Updated by corecode about 15 years ago

Actually it turns out that I'm getting an interrupt storm also without
natapicam. Happens when I access acd0.

Actions #2

Updated by steve about 15 years ago

On Fri, 23 Jan 2009 19:05:08 +0000
"Simon 'corecode' Schubert \(via DragonFly issue tracker\)"
<> wrote:

Does it go away if you turn of DMA with hw.ata.atapi_dma=0
in /boot/loader.conf ?
Actions #3

Updated by corecode about 15 years ago

nope, doesn't make a difference. I set it in the running system, though.

Actions #4

Updated by steve about 15 years ago

On Fri, 23 Jan 2009 20:53:01 +0000
"Simon 'corecode' Schubert \(via DragonFly issue tracker\)"
<> wrote:

That doesn't seem to work the same way - set it in loader.conf and
reboot to be sure. I always get an interrupt storm unless I boot with dma
disabled in loader.conf.
Actions #5

Updated by corecode about 15 years ago

I've tracked this down somewhat - if I access the video data (not just the
directory structure), it seems that the drive wants to report an error: the ERR
bit (0) is set in the status register. However, in the busmaster status
register, the interrupt AND the active bit are set.

As a result, ata-pci.c/ata_pci_status():475 will return 0, leading to an
unserviced interrupt. repeat. I believe we always need to pass through the
interrupt routine if the ERR bit is set, no matter whether the busmaster active
bit is still set or not.

Actions #6

Updated by dillon about 15 years ago

:Simon 'corecode' Schubert <> added the comment:
:
:I've tracked this down somewhat - if I access the video data (not just the
:directory structure), it seems that the drive wants to report an error: th=
:e ERR
:bit (0) is set in the status register. However, in the busmaster status
:register, the interrupt AND the active bit are set.
:
:As a result, ata-pci.c/ata_pci_status():475 will return 0, leading to an
:unserviced interrupt. repeat. I believe we always need to pass through th=
:e
:interrupt routine if the ERR bit is set, no matter whether the busmaster ac=
:tive
:bit is still set or not.

Were you able to to test your fix?  I don't have anything that livelocks
that I can test with.
Those BMSTAT bits aren't part of the standard at all (that I know of),
but they are the only way to be able to tell if an ATA device is
actually the source of an interrupt when multiple devices share the
same interrupt.
On the face of it it does seem as though we should be able to return
1 if ATA_BMSTAT_INTERRUPT is set and either ATA_BMSTAT_ERROR is
set or ATA_BMSTAT_ACTIVE is clear. It is unclear to me whether
we can safely clear the ACTIVE bit, though, which is what that
next ATA_IDX_OUTB() will do if we allow the ERROR + INTERRUPT case
through. Then again, maybe we have to. I just don't know.
-Matt
Matthew Dillon
&lt;&gt;
Actions #7

Updated by corecode about 15 years ago

Yes, that's the busmaster interrupt status. The problem is that the drive
doesn't set the busmaster error, but instead only the ata status error.

You can't clear ACTIVE, it is readonly. We'd have to abort the busmaster
transaction, and that's where I don't know what to do. I'll have to try
and see.

cheers
simon

Actions #8

Updated by corecode about 15 years ago

committed fix in 9f2212d

Actions

Also available in: Atom PDF