Bug #260

kernel panic during boot for ata trouble

Added by nonsolosoft over 8 years ago. Updated almost 6 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi,

I've download:
http://chlamydia.fs.ei.tum.de/pub/DragonFly/snapshots/i386/LATEST-Release-1.6.iso.bz2

and I tried it on a old PC with AMD Athlon 700MHz. and Asus A7V mobo.
That mainboard has 2 controllers (one Ultra66 and other Ultra100) and 4 ata channels; I've only one hd ata100 master on ata2 channel.

it enter in kernel panic after some resetting of ata2 channel:
----------
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: 152627MB <WDC WD1600JB-00REA0> [310101/16/63] at ata2-master UDMA100
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: READ command timeout tag=0 serv=0 - resetting
ad4: trying fallback to PIO mode
ata2: resetting devices .. done

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x41455378
fault code = supervisor write, page not present
instruction pointer = 0x8:0xc019cc66
stack pointer = 0x10:0xc8d25cdc
frame pointer = 0x10:0xc8d25d0c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = idle
current thread = pri 60 (CRIT)

kernel type 12 trap, code=2
stopped at ad_interrupt+0x38a: movl %eax,0x148(%edx)
db>
-------

I'm tried also 1.4.4 iso, it works and complete the bootstrap, here is the dmesg:

Copyright (c) 2003, 2004, 2005, 2006 The DragonFly Project.
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
DragonFly 1.4.4-RELEASE #1: Sat Apr 22 10:46:47 PDT 2006
root@:/home/usr.obj/usr/src/sys/GENERIC
TSC clock: 706217168 Hz, i8254 clock: 1193061 Hz
CPU: AMD Athlon(tm) Processor (706.29-MHz 686-class CPU)
Origin = "AuthenticAMD" Id = 0x642 Stepping = 2

Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
AMD Features=0xc0440000<RSVD,AMIE,DSP,3DNow!>
real memory = 134135808 (130992K bytes)
avail memory = 120131584 (117316K bytes)
Preloaded elf kernel "/kernel" at 0xc0714000.
Preloaded elf module "/modules/acpi.ko" at 0xc071421c.
Pentium Pro MTRR support enabled
md0: Malloc disk
pcibios: BIOS version 2.10
Using $PIR table, 9 entries at 0xc00f16c0
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <ASUS A7V> on motherboard
acpi0: Power Button (fixed)
Warning: ACPI is disabling APM's device. You can't run both
acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0
cpu0: <ACPI CPU (3 Cx states)> on acpi0
acpi_button0: <Power Button> on acpi0
compare 0
legacypci0 on motherboard
pcib0: <Host to PCI bridge> on legacypci0
pci0: <PCI bus> on pcib0
agp0: <VIA 82C8363 (Apollo KT133A) host to PCI bridge> mem 0xe7000000-0xe7ffffff at device 0.0 on pci0
pcib1: <VIA 8363 (Apollo KT133) PCI-PCI (AGP) bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <ATI Rage128-RF graphics accelerator> at 0.0 irq 11
isab0: <VIA 82C686 PCI-ISA bridge> at device 4.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C686 ATA66 controller> port 0xb800-0xb80f at device 4.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0: <VIA 83C572 USB controller> port 0xb400-0xb41f irq 5 at device 4.2 on pci0
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub0: port error, restarting port 1
uhub0: port error, giving up port 1
uhci1: <VIA 83C572 USB controller> port 0xb000-0xb01f irq 5 at device 4.3 on pci0
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhub1: port error, restarting port 1
uhub1: port error, giving up port 1
intr 5 at 50001 > 50000 hz, livelocked limit engaged!
uhub2: ALCOR Generic USB Hub, class 9/0, rev 1.10/1.00, addr 2
uhub2: 4 ports with 4 removable, self powered
pci0: <unknown card> (vendor=0x105a, dev=0x3d73) at 10.0 irq 12
pci0: <unknown card> (vendor=0x1274, dev=0x5880) at 11.0 irq 10
rl0: <RealTek 8139 10/100BaseTX> port 0x8400-0x84ff mem 0xdd800000-0xdd8000ff irq 11 at device 12.0 on pci0
miibus0: <MII bus> on rl0
rlphy0: <RealTek internal media interface> on miibus0
rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: MAC address: 00:05:1c:1b:6d:6f
atapci1: <Promise ATA100 controller> port 0x6800-0x683f,0x7000-0x7003,0x7400-0x7407,0x7800-0x7803,0x8000-0x8007 mem 0xdd000000-0xdd01ffff irq 10 at device 17.0 on pci0
ata2: at 0x8000 on atapci1
ata3: at 0x7400 on atapci1
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xcc000-0xcdfff,0xd0000-0xd4fff on isa0
pmtimer0 on isa0
fdc0: cannot reserve I/O port range
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
acpi_cpu: throttling enabled, 16 steps (100% to 6.2%), currently 100.0%
intr 5 at 7378 < 20000 hz, livelock removed
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: 152627MB <WDC WD1600JB-00REA0> [310101/16/63] at ata2-master UDMA100
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: READ command timeout tag=0 serv=0 - resetting
ad4: trying fallback to PIO mode
ata2: resetting devices .. done
acd0: CDROM <50X CD-ROM> at ata1-slave PIO4
Mounting root from cd9660:cd0c
cd0 at ata1 bus 0 target 1 lun 0
cd0: < 50X CD-ROM 5.3A> Removable CD-ROM SCSI-0 device
cd0: 16.000MB/s transfers
cd0: cd present [115810 x 2048 byte records]

I hope that this could be useful.

Bye, \fer

History

#1 Updated by dillon over 8 years ago

:Hi,
:
:I've download:
:http://chlamydia.fs.ei.tum.de/pub/DragonFly/snapshots/i386/LATEST-Release-1.6.iso.bz2
:
:and I tried it on a old PC with AMD Athlon 700MHz. and Asus A7V mobo.
:That mainboard has 2 controllers (one Ultra66 and other Ultra100) and 4 ata channels; I've only one hd ata100 master on ata2 channel.

Hmm. It is panicing on a wild pointer on line 739 of
sys/dev/disk/ata/ata-disk.c:

request->bio->bio_buf->b_resid = request->bytecount;

Either 'request', 'bio', or 'bio_buf' is a wild pointer.

Could you do a 'trace' from the DDB prompt ?

:kernel type 12 trap, code=2
:stopped at ad_interrupt+0x38a: movl %eax,0x148(%edx)
:db>
:
:Bye, \fer

-Matt
Matthew Dillon
<>

#2 Updated by nonsolosoft over 8 years ago

Hi Matt,

nice to meet you again (since Amiga DICE time). I appreciate your DFBSD project.

Matthew Dillon wrote:
> :Hi,
> :
> :I've download:
> :http://chlamydia.fs.ei.tum.de/pub/DragonFly/snapshots/i386/LATEST-Release-1.6.iso.bz2
> :
> :and I tried it on a old PC with AMD Athlon 700MHz. and Asus A7V mobo.
> :That mainboard has 2 controllers (one Ultra66 and other Ultra100) and 4 ata channels; I've only one hd ata100 master on ata2 channel.
>
> Hmm. It is panicing on a wild pointer on line 739 of
> sys/dev/disk/ata/ata-disk.c:
>
> request->bio->bio_buf->b_resid = request->bytecount;
>
> Either 'request', 'bio', or 'bio_buf' is a wild pointer.
>
> Could you do a 'trace' from the DDB prompt ?

ok

>
> :kernel type 12 trap, code=2
> :stopped at ad_interrupt+0x38a: movl %eax,0x148(%edx)
> :db>

ad_interrupt(c8cd0bc8,c0c91540,c8d25d84,c029d8e9,c0de4450) at ad_interrupt+0x38a

ata_intr(c0de4450,0,0,0,11) at ata_intr+0x10a
ithread_handler(a,0,0,0,0) at ithread_handler+0x9d
lwkt_exit() at lwkt_exit

I fear that such problem was generated by SATA and parallel ATA cables in the same box, then I've disconnected any SATA cable but this error appears in same way.

DragonFlyBSD doesn't detect yet the Promise SATA 300 TX2plus (PDC20775) pci board and devices connected to it.
Can I do something to have support for that controller?

Thank you in advance, \fer
--
NonSoLoSoft (Italy) - http://www.nonsolosoft.com/

#3 Updated by dillon over 8 years ago

:Hi Matt,
:
:nice to meet you again (since Amiga DICE time). I appreciate your DFBSD project.
:
:...
:> :kernel type 12 trap, code=2
:> :stopped at ad_interrupt+0x38a: movl %eax,0x148(%edx)
:> :db>
:
:ad_interrupt(c8cd0bc8,c0c91540,c8d25d84,c029d8e9,c0de4450) at ad_interrupt+0x38a
:
:ata_intr(c0de4450,0,0,0,11) at ata_intr+0x10a
:ithread_handler(a,0,0,0,0) at ithread_handler+0x9d
:lwkt_exit() at lwkt_exit
:
:
:I fear that such problem was generated by SATA and parallel ATA cables in the same box, then I've disconnected any SATA cable but this error appears in same way.

Hmm. It's clearly a software bug, but finding it is going to be
difficult.

:DragonFlyBSD doesn't detect yet the Promise SATA 300 TX2plus (PDC20775) pci board and devices connected to it.
:Can I do something to have support for that controller?
:

I did a quick google search and it looks like it ought to be fairly easy
to support the controller. A couple of PCI ID entries might do it.

What we really need to do is port the current FreeBSD ATA driver to
DragonFly. Our current code is ancient. I will spend a day to see
if I can do that.

-Matt
Matthew Dillon
<>

#4 Updated by nonsolosoft over 8 years ago

Matthew Dillon wrote:

> Hmm. It's clearly a software bug, but finding it is going to be
> difficult.

I've seen that both FreeSBIE 1.0 (FreeBSD 5.3) and FreeBSD 7.0-CURRENT do not report any connection problem to fallback to PIO as DragonFlyBSD 1.4.4 does on this PC and this HD.

ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: 152627MB <WDC WD1600JB-00REA0> [310101/16/63] at ata2-master UDMA100
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: READ command timeout tag=0 serv=0 - resetting
ata2: resetting devices .. done
ad4: READ command timeout tag=0 serv=0 - resetting
ad4: trying fallback to PIO mode

>
> :DragonFlyBSD doesn't detect yet the Promise SATA 300 TX2plus (PDC20775) pci board and devices connected to it.
> :Can I do something to have support for that controller?
> :
>
> I did a quick google search and it looks like it ought to be fairly easy
> to support the controller. A couple of PCI ID entries might do it.

I've just tested with FreeBSD 7.0-CURRENT and this is output, I don't know if it could be useful for you:

atapci1: <Promise PDC40775 SATA 300 Controller> port 0x947f, 0x9000-0x90ff mem 0xde800000-0xde800fff, 0xde000000-0xde01ffff irq 12 at device 10.0 on pci0

>
> What we really need to do is port the current FreeBSD ATA driver to
> DragonFly. Our current code is ancient. I will spend a day to see
> if I can do that.

Thank you Matt for your passion on DFBSD. \fer

#5 Updated by corecode over 8 years ago

Matthew Dillon wrote:
> What we really need to do is port the current FreeBSD ATA driver to
> DragonFly. Our current code is ancient. I will spend a day to see
> if I can do that.

thomas spanjaard is working on a patch. maybe he can publish some diffs to make the progress quicker?

cheers
simon

#6 Updated by dillon over 8 years ago

:thomas spanjaard is working on a patch. maybe he can publish some diffs =
:to make the progress quicker?
:
:cheers
: simon

I took a long look at the FreeBSD code today. It is not going to be an
easy port, mainly due to a ton of API changes from FreeBSD<->DragonFly
code divergence. It isn't hard necessarily, but there will be a lot of
grunt work.

I am going to continue to work on the userland VFS API, but I'll get
some of the device API changes done first so the device API stops
being a moving target. The device work has to run along the same
lines as the VFS work. Moving arguments into argument structures
and providing an abstraction layer (which are those SYSLINK commits I
made a few days ago) that will allow them to operate in userland over
a communications link.

I'll revisit the ATA code and see how Thomas is doing after I get
the device work done (which should be by the weekend).

-Matt

#7 Updated by nonsolosoft over 8 years ago

Matthew Dillon wrote:

>
> I'll revisit the ATA code and see how Thomas is doing after I get
> the device work done (which should be by the weekend).
>
> -Matt

Hi Matt,

have you rescheduled ATA revisit?
When do you plan to renew ATA support?

Thank you in advance, \fer

#8 Updated by dillon over 8 years ago

:Hi Matt,
:
:have you rescheduled ATA revisit?
:When do you plan to renew ATA support?
:
:Thank you in advance, \fer

Well, Thomas and Simon set up a mercurial repository, but I really
don't want to try to use yet another repository to work on the driver.
I would much prefer that it simply be brought into CVS and worked on
in CVS. It would simply remain disconnected from the build until
it can be made to work.

There are only two header files outside of the nata driver code that
conflict. The nata driver versions of those header files can simply
be renamed so as not to conflict.

-Matt
Matthew Dillon
<>

#9 Updated by TGEN over 8 years ago

Matthew Dillon wrote:
> :have you rescheduled ATA revisit?
> :When do you plan to renew ATA support?
> :Thank you in advance, \fer
> Well, Thomas and Simon set up a mercurial repository, but I really
> don't want to try to use yet another repository to work on the driver.
> I would much prefer that it simply be brought into CVS and worked on
> in CVS. It would simply remain disconnected from the build until
> it can be made to work.

You don't have to, it can be imported into CVS and have the existing
patches applied there, and then the remaining work.

> There are only two header files outside of the nata driver code that
> conflict. The nata driver versions of those header files can simply
> be renamed so as not to conflict.

ata.h has already been renamed nata.h, and I guess you're talking about
opt_ata.h for the other one, which can be renamed to opt_nata.h without
too much of a problem. Then the only 'conflict' keeping both drivers
from being in the same system is that the new ATA driver supports the
same hardware as the old one besides new hardware it also supports,
meaning it depends on who attaches first; we could just let config(8)
bail out as soon as it finds both ata and nata devices in the kernel
config. Also, it's possible to have both drivers attach only a set of
devices which don't overlap with eachother. I'm not sure if that's
something worth having?

Cheers,
--
Thomas E. Spanjaard

#10 Updated by dillon over 8 years ago

:ata.h has already been renamed nata.h, and I guess you're talking about
:opt_ata.h for the other one, which can be renamed to opt_nata.h without
:too much of a problem. Then the only 'conflict' keeping both drivers
:from being in the same system is that the new ATA driver supports the
:same hardware as the old one besides new hardware it also supports,
:meaning it depends on who attaches first; we could just let config(8)
:bail out as soon as it finds both ata and nata devices in the kernel
:config. Also, it's possible to have both drivers attach only a set of
:devices which don't overlap with eachother. I'm not sure if that's
:something worth having?
:
:Cheers,
:--
: Thomas E. Spanjaard
:

Generally speaking it should be one or the other, but it might actually
be a good idea to add an exception variable in the kernel environment
that can be specified in /boot/loader.conf to allow both drivers to
attach to different ATA controllers for testing purposes. Lost of
machines have normal devices and SATA devices (with different PCI IDs).

This way we could boot a test machine with the old driver talking to
the boot disk and the new driver talking to a disk on the other
controller. Again, just for testing.

-Matt

#11 Updated by corecode about 8 years ago

Does this still happen?

#12 Updated by nonsolosoft about 8 years ago

I've not worked on this since last report.
I can retry next future as soon as I get that hardware again.

Bye, \fer

#13 Updated by nonsolosoft about 8 years ago

No it doesn't happen anymore.

But lastest HEAD that I've got last saturday doesn't detect Promise SATAII 300 TX2.

Here the pciconf -l -v

none0@pci0:10:0: class=0x018000 card=0x3d73105a chip=0x3d73105a rev=0x02 hdr=0x00
vendor = 'Promise Technology Inc'
device = 'PDC20775 SATAII 300 TX2plus IDE Controller'
class = mass storage

Is new ATA ready?

Bye, \fer

Also available in: Atom PDF