Bug #1106
closedNetwork-related crash on boot
0%
Description
I'm running the release version of 2.0 unpatched.  My Dell Inspiron 8000 
laptop has a two nics: dc0 (cardbus) and xl0 (built-in).  When I include 
ifconfig lines in rc.conf for both of these interfaces, I get the 
following error during boot right after "setting hostname":
fatal trap 12: page fault while in kernel mode
fault virtual address = 0x4c
fault code = supervisor write, page not present
instruction pointer = 0x8 :0xc03ede5b
stack pointer = 0x10 :0xca615cac
frame pointer = 0x10 :0xca615cac
code segment = base 0x0, limit 0xfffff, type 0x1b
        = DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = idle
current thread = pri 60 (CRIT)
kernel: type 12 trap, code=2
Stopped at dc_rxeof+0x223: movl %edx,0x4c(%eax)
db>
If I comment out either of the ifconfig lines, the machine boots 
successfully.  Note that I haven't tried this with previous versions of 
DF and OpenBSW works fine on this same hardware.  I've tried booting 
with ACPI enabled and disabled.  Let me know if you need more info.  See 
dmesg below.
Tim
Last login: Sat Aug  2 16:25:27 2008
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
        The Regents of the University of California.  All rights reserved.
DragonFly 2.0.0-RELEASE (MYKERNEL) #0: Mon Jul 21 13:56:34 MST 2008
Welcome to DragonFly!
mesquite# cat dm.txt
Copyright (c) 2003-2008 The DragonFly Project.
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
DragonFly 2.0.0-RELEASE #0: Mon Jul 21 13:56:34 MST 2008
    root@df.timdarby.net:/usr/obj/usr/src/sys/MYKERNEL
TSC clock: 698428905 Hz, i8254 clock: 1193112 Hz
CPU: Intel Pentium III (698.47-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x686  Stepping = 6
Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory  = 335454208 (327592K bytes)
avail memory = 312328192 (305008K bytes)
Preloaded elf kernel "/kernel" at 0xc07cf000.
Preloaded elf module "/modules/acpi.ko" at 0xc07cf1c0.
Pentium Pro MTRR support enabled
md0: Malloc disk
pcibios: BIOS version 2.10
Using $PIR table, 10 entries at 0xc00fbc20
ACPI: RSDP  0x0xfde50/0x0014 (v  0 DELL  ) 0x0xfde64/0x0028 (v  1 DELL    CPi R   0x27D40115 ASL  
ACPI: RSDT 
0x00000061)
ACPI: FACP  0x0xfde90/0x0074 (v  1 DELL    CPi R   0x27D40115 ASL   0x0xfffe4000/0x2B4C (v  1 INT430 SYSFexxx 0x00001001 MSFT 
0x00000061)
ACPI: DSDT 
0x0100000E)
ACPI: FACS @ 0x0x13fff800/0x0040
npx0: <math processor> on motherboard
npx0: INT 16 interface
Using MMX optimized bcopy/copyin/copyout
acpi0: <DELL CPi R  > on motherboard
Warning: ACPI is disabling APM's device.  You can't run both
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU (2 Cx states)> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
acpi_acad0: <AC Adapter> on acpi0
acpi_cmbat0: <Control Method Battery> on acpi0
acpi_cmbat1: <Control Method Battery> on acpi0
acpi_lid0: <Control Method Lid Switch> on acpi0
acpi_button0: <Power Button> on acpi0
acpi_button1: <Sleep Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
fdc0: <NEC 72065B or clone> port 0x3f7,0x3f2-0x3f5 irq 6 drq 2 on acpi0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0 port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A
ppc0 port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
legacypci0 on motherboard
pcib0: <Host to PCI bridge> on legacypci0
pci0: <PCI bus> on pcib0
agp0: <Intel 82815 (i815 GMCH) host to PCI bridge> mem 
0xe4000000-0xe7ffffff at device 0.0 on pci0
pcib1: <Intel 82801BA/BAM (ICH2) PCI-PCI (AGP) bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <ATI model 4d46 graphics accelerator> at 0.0 irq 11
pcib2: <PCI to PCI bridge (vendor=8086 device=2448)> at device 30.0 on pci0
pci2: <PCI bus> on pcib2
pci2: <unknown card> (vendor=0x125d, dev=0x1998) at 3.0 irq 5
xl0: <3Com 3c556 Fast Etherlink XL> port 0xe800-0xe8ff mem 
0xf8ffd800-0xf8ffd87f,0xf8ffdc00-0xf8ffdc7f irq 10 at device 6.0 on pci2
miibus0: <MII bus> on xl0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: MAC address: 00:01:03:89:85:bf
pci2: <unknown card> (vendor=0x10b7, dev=0x1007) at 6.1 irq 10
cbb0: <TI4451 PCI-CardBus Bridge> at device 15.0 on pci2
cardbus0: <CardBus bus> on cbb0
pccard0: <16-bit PCCard bus> on cbb0
pci_cfgintr: 0:30 INTC routed to irq 10
pcib2: routed slot 15 INTA to irq 10
cbb1: <TI4451 PCI-CardBus Bridge> at device 15.1 on pci2
cardbus1: <CardBus bus> on cbb1
pccard1: <16-bit PCCard bus> on cbb1
pci_cfgintr: 0:30 INTC routed to irq 10
pcib2: routed slot 15 INTA to irq 10
fwohci0: <Texas Instruments PCI4451> mem 
0xf8ff8000-0xf8ffbfff,0xf8ffc800-0xf8ffcfff irq 10 at device 15.2 on pci2
fwohci0: OHCI version 1.0 (ROM=1)
fwohci0: No. of Isochronous channel is 4.
fwohci0: EUI64 39:4f:c0:00:1a:82:d4:01
fwohci0: Phy 1394a available S400, 1 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394 bus> on fwohci0
fwe0: <Ethernet over FireWire> on firewire0
fwe0: MAC address: 3a:4f:c0:82:d4:01
sbp0: <SBP-2/SCSI over FireWire> on firewire0
fwohci0: Initiate bus reset
fwohci0: node_id=0xc000ffc0, gen=1, CYCLEMASTER mode
firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)
isab0: <PCI to ISA bridge (vendor=8086 device=244c)> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH2 UDMA100 controller> port 
0xbfa0-0xbfaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
dc0: <Xircom X3201 10/100BaseTX> port 0xe000-0xe07f mem 
0xf4002000-0xf40027ff,0xf4002800-0xf4002fff irq 10 at device 0.0 on cardbus0
miibus1: <MII bus> on dc0
ukphy1: <Generic IEEE 802.3u media interface> on miibus1
ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
dc0: MAC address: 00:10:a4:e4:76:5f
ad0: 9590MB <IBM DJSA-210 JS2OAB8A> at ata0-master UDMA66
acd0: CDROM <TEAC CD-ROM CD-224E/3.7C> at ata0-slave UDMA33
ata1: <ATA channel 1> on atapci0
uhci0: <Intel 82801BA/BAM (ICH2) USB controller USB-A> port 
0xdce0-0xdcff irq 10 at device 31.2 on pci0
usb0: <Intel 82801BA/BAM (ICH2) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
pmtimer0 on isa0
fdc1: cannot reserve I/O port range
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio2: can't drain, serial port might not exist, disabling
ppc1: cannot reserve I/O port range
Mounting root from ufs:/dev/ad0s1a
cd0 at ata0 bus 0 target 1 lun 0
cd0: <TEAC CD-224E 3.7C> Removable CD-ROM SCSI-0 device
cd0: 33.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present
pflog0: promiscuous mode enabled
Files
       Updated by sepherosa about 17 years ago
      Updated by sepherosa about 17 years ago
      
    
    Look like mbuf is NULL.
This probably is:
m->m_pkthdr.rcvif = ifp;
on line 2571
Please test following patch:
http://leaf.dragonflybsd.org/~sephe/if_dc.c.diff
Best Regards,
sephe
       Updated by t-df about 17 years ago
      Updated by t-df about 17 years ago
      
    
    Thanks for the quick reply. This didn't help, but I have more info after 
playing with it a bit.  It turns out that the crash specifically occurs 
when both interfaces are ifconfig'ed in rc.conf AND dc0 is set to use 
dhcp.  It seems that any other combination is safe.
Tim
       Updated by sepherosa about 17 years ago
      Updated by sepherosa about 17 years ago
      
    
    Same backtrace or something new? If possible give me a coredump.
Best Regards,
sephe
       Updated by t-df about 17 years ago
      Updated by t-df about 17 years ago
      
    
    It produced the same backtrace, except that frame pointer = 0x10 :0xca615cdc
How do I get the system to generate a coredump?  After it drops me into 
the debugger, the only thing I can see to do is execute reset to reboot 
and then I have to boot into single user mode to be able to change 
rc.conf file, so the machine will boot normally again.
Tim
       Updated by luxh about 17 years ago
      Updated by luxh about 17 years ago
      
    
    Try to follow this guide on the wiki:
http://wiki.dragonflybsd.org/index.cgi/HowToCreateACoreDump
Good luck.
Max
       Updated by t-df about 17 years ago
      Updated by t-df about 17 years ago
      
    
    Thanks for the pointer.  I somehow failed to find that in the wiki. I 
think I succeeded, although I had to execute the panic command twice in 
the debugger before it actually dumped the kernel.  The files are 
available at http://host.timdarby.net/debug/
Thanks,
Tim
       Updated by sepherosa about 17 years ago
      Updated by sepherosa about 17 years ago
      
    
    Please test following patch:
http://leaf.dragonflybsd.org/~sephe/if_dc.c.diff2
Best Regards,
sephe
       Updated by t-df about 17 years ago
      Updated by t-df about 17 years ago
      
    
    I tried this patch and got the panic message "dc0 is not running yet".  
xl0 appears to have come up ok. I have to run off to work now, but I can 
get you more details later, if needed.
Thanks,
Tim
       Updated by sepherosa about 17 years ago
      Updated by sepherosa about 17 years ago
      
    
    Please test the attached patch.
Best Regards,
sephe
       Updated by t-df about 17 years ago
      Updated by t-df about 17 years ago
      
    
    It works! So, out of curiosity, what does this patch do?
Thanks,
Tim
       Updated by sepherosa about 17 years ago
      Updated by sepherosa about 17 years ago
      
    
    Thank you for testing!
- Driver should only check IFF_RUNNING in intr handler, not IFF_UP
- SIOC{ADD,DEL}MULTI will try to reprogram hardware's mcast filter,
but that should only be done after device is running (IFF_RUNNING)
- For unknown reason xircom's mcast filter programming function turns
on IFF_RUNNING, which is wrong.  I nuked that line and added an
assertion in xircom's mcast programming function to make sure that NIC
is running, but I forgot that in dc_init() IFF_RUNNING is turned on
after mcast filter programming (that's the cause of the latest panic
:P)
Best Regards,
sephe