Bug #566

NATA a nonstarter with ATI SB600 on MSI K9AGM-FID

Added by floid about 7 years ago. Updated almost 7 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:TGEN% Done:

0%

Category:-
Target version:-

Description

As I've been rambling about on the blawg --
http://www.shiningsilence.com/dbsdlog/index.php/2007/02/23/2143.html -- turns
out using February 24, 2007's HEAD with NATA enabled refuses to boot off my
drive hooked onto the SB600's parallel ATA port.

Using the NATA kernel (nata, natadisk, natapicd, natausb, nataraid enabled; no
atapicam; options PCI_MAP_FIXUP set), the system recognizes ATA controllers
present, including atapci0, finds and reports ad0's identification string,
announces its attempt to mount root from ad0s1a, then hangs without a blink from
the controller's activity LED.

In my brief testing with the 1.9.0 kernel, I've seen boots where it gets 'stuck'
quietly at that point, and boots where it would announce a segmentation fault in
init every 30 seconds.

---

From my 1.8.0 'old' kernel, which isn't conflicting too badly with the 1.9.0 world:

atapci0: <Generic PCI ATA controller> port 0xff00-0xff0f,0-0x3,0-0x7,0-0x3,0-0x7
irq 2 at device 20.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ad0: 152627MB <WDC WD1600JB-98GVA0> [310101/16/63] at ata0-master BIOSDMA

1.9.0 detected similarly, albeit with atapci0 on irq3 (could be accurate with
PnP slop from turning the SATA controller option on and off), plus the SATA
ports; I think the SATA controller might still be disabled in the BIOS, so lack
of any further detections here shouldn't be conclusive about what the 'old'
driver can and can't do.

[Of course, this machine has no useful serial port until I scrounge or build a
header, so let me know if anything's needed from NATA boot messages and I'll
pencil it down or take some photos of the whole thing.]

pciconf -lv (3.96 KB) floid, 02/25/2007 05:50 PM

sb600.diff Magnifier (1.55 KB) TGEN, 02/25/2007 08:53 PM

dmesg.1.8.0 (6.73 KB) floid, 02/25/2007 10:32 PM

K9AGM NATA boot log jpegs.tar.gz (1.91 MB) floid, 03/02/2007 01:10 PM

MUSTELID_NEWATA (10.2 KB) floid, 03/02/2007 01:24 PM

dmesg.1.9PREVIEW-NATA (6.94 KB) floid, 06/03/2007 01:29 AM

dmesg.1.9PREVIEW-OLDATA (6.79 KB) floid, 06/03/2007 01:29 AM

dmesg.1.9PREVIEW-NATA2 (6.63 KB) floid, 06/03/2007 02:12 AM

dmesg.1.9PREVIEW-OLDATA2 (6.46 KB) floid, 06/03/2007 02:12 AM

MUSTELID2007 (10.2 KB) floid, 06/03/2007 02:30 AM

History

#1 Updated by TGEN about 7 years ago

Currently, the NATA atapci code does not match the SB600 right; I'll see about
updating/adding support for that.

#2 Updated by TGEN about 7 years ago

Could you try the attached diff? It appears nothing fundamentally changed,
except that these parts support SATA300.

#3 Updated by floid about 7 years ago

No luck with a quickkernel, though it does properly name the device; trying
again with the regular target.

I should've said this is on SMP (Athlon 64 x2 3800+). More news after next boot.

#4 Updated by floid about 7 years ago

No real difference with the full build. Maaaybe(?) it wasn't getting to
"WARNING: was not properly dismounted" before, but boom goes the fault
immediately after that.

#5 Updated by floid about 7 years ago

dmesg from the old (1.8) kernel attached to document any other aspects of
hardware that I've forgotten. The system does flop the controller in question
between irq2 and irq3 as other devices (the SATA side, namely) are disabled or
enabled in the BIOS setup -- which is to say I've observed it work with either
assignment under 1.8 during all these reboots.

All the BIOS knobs are back as they came, meaning the SATA controller is
enabled. (The PATA controller I'm trying to use is always-on; no option to
disable it.)

#6 Updated by TGEN about 7 years ago

Where fault == lockup or mountroot_try() failing to be able to mount root?

Cheers,
--
Thomas E. Spanjaard

#7 Updated by floid about 7 years ago

Well, the same segfault in init. I'm not sure if that counts as a total lockup,
but it certainly doesn't proceed from there without a disk to talk to.

I'm going to try to unstupid myself about getting more verbose output (if such
is available), hints always appreciated. :)

#8 Updated by floid about 7 years ago

Right there in the loader menu, of course...

With verbosity on, from pencil notes:
[Messages showing the SB600 being detected fine and going into PIO4, then UDMA5]
[...]
ATA PseudoRAID Loaded
Mounting root from ufs:/dev/ad0s1a
ad0s1: type 0xa5, start 63, end = 312581807, size 312581745 : OK
start_init: trying /sbin/init
[~TIMESTAMP~] init: fatal signal: segmentation fault

As noted, the segfault will repeat every 30 seconds, and FWIW, CTRL-ALT-DEL
makes it a bus error and what would appear to be a full lockup.

Ripping out nataraid produces the same result minus its message.

The 1.8 kernel continues to be perfectly happy with this init, confirmed to have
been built last night along with the rest of the 1.9/HEAD world.

Clarification: This isn't the "NATA" config, but my usual config with ata out
and nata in. I notice the NATA config in the tree leaves out natausb, so I'll
examine a diff and try taking that out next unless new insight strikes me.

#9 Updated by TGEN about 7 years ago

I'll see if I can reproduce anything like that tomorrow.

Well, I'll try a couple of things tomorrow, I'll get back to you after that.

Goodnight,
--
Thomas E. Spanjaard

#10 Updated by floid about 7 years ago

Hmm! After actually doing a "make clean" in /usr/src, a -j5'd make buildkernel
died with a syntax error somewhere [that, because I'm me, I've lost off the top
of the scrollback buffer]. A serial build's now cranking along properly, news
at 11...

I'll go back and find the multithreaded-build-breaker if this works.

[See you in the morning! :)]

#11 Updated by floid about 7 years ago

...Except that was from typoing "make BUILDKERNEL=" instead of "make buildkernel
KERNCONF=", so much for false hope.

<Floid done reporting until the morning, unless Floid actually fix anything.
Sorry for all the noise, heh.>

#12 Updated by floid about 7 years ago

Figured I should test my 1.9 tree with the old ATA driver; discovered the following:

Building with "options PCI_MAP_FIXUP" *DOES NOT* enumerate ad0 and so works to
the point where it can't find the root device.

Building WITHOUT "options PCI_MAP_FIXUP" results in a fully-working 'old ata'
kernel, which I'm using to type this.

Building a NATA kernel WITHOUT "options PCI_MAP_FIXUP" fails the same way as
those built with, except once (out of two test boots) it spat out "init: fatal
signal: illegal instruction" instead of a segmentation fault.

#13 Updated by TGEN about 7 years ago

The old ATA driver does not work with options PCI_MAP_FIXUP for ATA
controllers in legacy mode.

That would be odd, because without PCI_MAP_FIXUP it shouldn't be able to
detect all resources an ATA controller in legacy mode uses.

So far, I haven't been able to reproduce those segmentation faults.

Cheers,
--
Thomas E. Spanjaard

#14 Updated by floid about 7 years ago

~2MB of digital camera shots covering a full verbose (attempt to) boot, now that
I found the SC_HISTORY_SIZE kernel option.

For some reason this particular boot just stalled while declaring itself loading
init; no segfault or illegal instruction errors. This might've been related to
going into scrollback immediately after that message popped up, but coming back
out and waiting for 30 seconds didn't produce anything.

The last image in the archive is a snapshot of the BIOS information screen from
a subsequent boot, showing IRQ11 assigned to one "PCI IDE controller," and
nothing? assigned to another.

There're a lot of images there; atapci0 shows up around 100_0487.jpg.

The APIC_IO complaint back in 100_0476.jpg may be of interest, but it doesn't
bother a non-NATA kernel.

[The board is actually a MSI K9AGM-FID; there are non '-FID' variants that
aren't as fully populated.]

#15 Updated by floid about 7 years ago

[config of kernel used to produce the 'log;' SMP enabled, since previous testing
confirmed going uniprocessor didn't change anything.]

#16 Updated by TGEN about 7 years ago

The UDMA133 controller uses legacy IO ports, but the channels don't have
a legacy IRQ assigned. They should show up as using IRQs 14 and 15 (not
the controller, the channels!), but they don't. I'll have to peer more
into the PCI code to get this right...

P.S.
Could you somehow post a pciconf -lv or the like, especially the pci
class/subclass codes?

Cheers,
--
Thomas E. Spanjaard

#17 Updated by floid about 7 years ago

> Could you somehow post a pciconf -lv or the like, especially the pci
> class/subclass codes?

http://bugs.dragonflybsd.org/file197/pciconf%20-lv is still available...

I'll double-check if there's anything to be prodded in the BIOS regarding IRQ
assignments, but it's pretty spare (and I think that option, if there is one,
already on).

-Thanks,
-Floid

#18 Updated by floid almost 7 years ago

Two new 'developments:'

-Finally gave in/gave up and ordered some PATA-to-SATA bridges last week, so
I'll be able to test the theory that this is only a problem with the legacy PATA
controller when they arrive.

-Just tried the new -PREVIEW with the big NATA update, and things haven't
improved much if at all for this configuration. I can "sort of" boot (if I can
avoid fsck, e.g. single-user), the disk comes up in UDMA33, ICRC errors galore,
I couldn't convince natacontrol to raise it *above* UDMA33, but trying to choose
BIOSDMA took it all the way back to PIO4, which seems to be stable enough for me
to build a kernel with the old driver like I should've done before I embarked on
the test.

Pretty sure the channels still aren't showing an IRQ assigned, also pretty sure
the (cosmetic) sb600.diff ID patch in the bugtracker didn't make it into the tree.

...

Still pretty disorganized over here, I'll try to log things methodically when
the SATA parts arrive.

#19 Updated by dillon almost 7 years ago

:Joe "Floid" Kanowitz <> added the comment:
:
:Two new 'developments:'
:
:-Finally gave in/gave up and ordered some PATA-to-SATA bridges last week, so
:I'll be able to test the theory that this is only a problem with the legacy PATA
:controller when they arrive.

PATA to SATA bridge? You mean something that plugs into a PATA header
and provides a SATA port? That can't be right... do people actually
make those? I'd be amazed if that sort of thing worked reliably.

Or did you mean you ordered SATA controllers that plug into e.g. PCI
slots and provide SATA ports?

:-Just tried the new -PREVIEW with the big NATA update, and things haven't
:improved much if at all for this configuration. I can "sort of" boot (if I can
:avoid fsck, e.g. single-user), the disk comes up in UDMA33, ICRC errors galore,
:I couldn't convince natacontrol to raise it *above* UDMA33, but trying to choose
:BIOSDMA took it all the way back to PIO4, which seems to be stable enough for me
:to build a kernel with the old driver like I should've done before I embarked on
:the test.
:
:Pretty sure the channels still aren't showing an IRQ assigned, also pretty sure
:the (cosmetic) sb600.diff ID patch in the bugtracker didn't make it into the tree.

NATA doesn't report irq assignments, I haven't looked into why not.

Please post the 'atapci', 'ata', 'acd', and 'ad' lines in the boot.
e.g. you should see something like this:

atapci0: <nVidia nForce3 Pro UDMA133 controller> port ...
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
acd0: CDRW <CD-W54E/1.1B> at ata1-master PIO4
atapci1: <nVidia nForce3 Pro SATA150 controller> port ...0.0 on pci0
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
ad6: 194481MB <Maxtor 6L200M0 BACE1G20> at ata3-master SATA150

-Matt
Matthew Dillon
<>

#20 Updated by floid almost 7 years ago

--- Matthew Dillon <> wrote:

Hey, thanks for giving this your attention!

problem
> : with the legacy PATA controller when they arrive.
>
> PATA to SATA bridge? You mean something that plugs into a PATA
> header and provides a SATA port? That can't be right... do people
> actually make those? I'd be amazed if that sort of thing worked
> reliably.

Yep, they're converter boards that turn a PATA disk into a SATA disk,
the chipsets are from known quantities like Silicon Image and JMicron,
they've been around for a few years now and have got to be at least as
reliable as a PATA to USB/1394 design. Someone on IRC was begging me
to try one back when I first opened this bug.

Some of the chipsets are actually bidirectional, so they really are
bridges (subject to board implementation)...

I have a big pile of brand new PATA disks, if there's actually trouble
using the SATA ports then I'll give up and buy a native disk to
eliminate variables. :} Buying a new controller would take all the
"fun" out of making the SB600 work.

The theory was that the SATA ports are probably perfectly supported
already, and I'm the only person in BSD-land who has ever bothered
trying to use the PATA port. ;)

"sort
> : of" boot (if I can avoid fsck, e.g. single-user), the disk comes up

TGEN was puzzled too and almost blaming that at first glance, per
http://bugs.dragonflybsd.org/msg2326 . Of course, it probably wouldn't
work at all, then...

As I type this, that machine is limping through a buildkernel for the
old driver using the NATA kernel at PIO4. Since this makes it stable
(albeit slower even than BIOSDMA), I should be able to save and post
both NATA and decaf dmesgs once it's done. Might be an hour.

Please have a glance at the existing attachments to
http://bugs.dragonflybsd.org/issue566 ,
for instance the pciconf information is still good:
http://bugs.dragonflybsd.org/file197/pciconf%20-lv

It's a UDMA100 drive, UDMA100 cable (came with the drive), I have no
idea where the UDMA33 limitation arises. Perhaps a UDMA33 drive would
even work right and this Western Digital doesn't like being forced to
fall back that far, or the 80-wire cable is actually out-of-spec for
that signaling.

Of course, it only likes to play in BIOSDMA mode with the old driver,
too, which I chalked up to the SB600 being 'too new.' The exact disk
and cable played nice at UDMA66 with my previous system's Via 686A.

AMD says of the SB600 (part of the "Radeon Xpress 1100"):
"ATA 133 controller support up to UDMA mode 6 with 2 drives (disk or
optical)"

That's been wordy and speculative, pardon. dmesgs should be coming.
-Joe "Floid" Kanowitz

#21 Updated by floid almost 7 years ago

Latest (non-verbose) dmesg output:
http://bugs.dragonflybsd.org/file262/dmesg.1.9PREVIEW-NATA
http://bugs.dragonflybsd.org/file263/dmesg.1.9PREVIEW-OLDATA

SNAFU here - I fiddled with disabling and re-enabling the SATA controller in the
board's BIOS earlier, then left it reenabled in "AHCI" mode for these boots
(choices being "Legacy IDE," "Native IDE," "AHCI," "RAID"). To get the SATA
controller(s) to show, I should've put it back to "Native IDE."

Since that's probably of interest, expect two more shortly.

#22 Updated by floid almost 7 years ago

With SATA "Native IDE" mode selected, SATA controller (atapci0) showing:
http://bugs.dragonflybsd.org/file264/dmesg.1.9PREVIEW-NATA2
http://bugs.dragonflybsd.org/file265/dmesg.1.9PREVIEW-OLDATA2

Unplugged the USB card reader, no da* cruft in these.

The board has 4 SATA ports total, plus 1 physical PATA connector, all SB600-driven:
http://www.msicomputer.com/product/p_spec.asp?model=K9AGM-FID

The SB600 has a second PATA channel that goes nowhere on this design (ata1); no
idea if I should be seeing more than atapci0's ata2 and ata3 for the SATA side
or if some master/slave emulation occurs in this mode.

I'm uploading the NATA kernel's config ("MUSTELID2007") with this message as
well -- through the bugtracker, so it'll be attached at
http://bugs.dragonflybsd.org/issue566 .

#23 Updated by floid almost 7 years ago

One mystery solved; reapplying TGEN's sb600.diff should (re-)enable the higher
UDMA modes. So I'll now be testing the 'UDMA33 just ain't right' theory.

#24 Updated by dillon almost 7 years ago

:Joe "Floid" Kanowitz <> added the comment:
:
:One mystery solved; reapplying TGEN's sb600.diff should (re-)enable the higher
:UDMA modes. So I'll now be testing the 'UDMA33 just ain't right' theory.

I will include that patch in the commit I am making right now to HEAD.

-Matt
Matthew Dillon
<>

#25 Updated by floid almost 7 years ago

--- Matthew Dillon <> wrote:

> I will include that patch in the commit I am making right now to HEAD.

The good news is that it now detects the drive as a UDMA100 device, geometry
looks right at first glance, etc.

The bad news is that it gets all the way to "start_init: trying /sbin/init" and
hangs there without a flicker of activity from the drive. Console scrollback
still works, no panic, just sits there.

Sad to say I still don't have a serial port on this thing, I'll have to go pull
a 386 apart for the IDC-to-DB9. :P

#26 Updated by floid almost 7 years ago

Whups. SB600 PATA really is only a single channel, apparently. No idea if this
impacts NATA since the 'extra' channel is equally present with the old driver.

http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_14603_14620%5E14623,00.html

I'm at a loss when it comes to expressing this in ata-chipset.c.

#27 Updated by floid almost 7 years ago

After the historic NATA fix of June 2007...

Worksforme. :)

%natacontrol list
ATA channel 0:
Master: ad0 <WDC WD1600JB-98GVA0/08.02D08> ATA/ATAPI revision 6
Slave: no device present
ATA channel 1:
Master: no device present
Slave: no device present
ATA channel 2:
Master: no device present
Slave: no device present
ATA channel 3:
Master: no device present
Slave: no device present

%natacontrol mode ad0
current mode = UDMA100

Also available in: Atom PDF