Bug #2686
closedkernel 3.8.1 freezes and panics at boot
0%
Description
DFly 3.6.x x86_64 boots and runs fine on my laptop
(macbookpro5,5), the 3.8.1 install iso freezes at boot. (acpi/ehci
disabled in both cases.) These are the last lines before freezing
[boot_verbose=YES]:
--- ---
pci0: <processor> (vendor 0x10de, dev 0x0aa3) at device 3.5 irq 14
ohci0.pci0.pcib0.legacy0.nexus0.root0
ohci0: <nVidia nForce MCP79 USB Controller> [tentative] mem
0xd3488000-0xd3488fff irq 11 at device 4.0 on pci0
ohci0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd3488000
---FREEZE---
Since 3.6.x boots and runs fine (with acpi disabled) on this machine,
I thought the freeze was related to the new usb. So upgraded
3.6.2 to 3.8.1 via git, did buildworld and buildkernel with WANT_OLDUSB=yes
in /etc/make.conf. The system now panics at boot, showing
[boot_verbose=YES]:
---3.8.1-oldusb---
fwohci0: <1394 Open Host Controller Interface> [tentative] mem
0xd3100000-0xd3100fff irq 7 at device 0.0 on pci4
fwohci0: latency timer 0 > 32.
fwohci0: cache size 64 -> 64.
fwohci0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd3100000
---panic--
and then giving the db> prompt but because it freezes, or at least the
keyboard freezes, I can't call dumpsys. So unfortunately there's no
crash dump. I could then boot and run the system with the old 3.6.2
kernel.
Further: since fwohci0 was the last line in the previous message, I
recompiled the 3.8.1 kernel without firewire:
- FireWire support
#device firewire # FireWire bus code
#device sbp # SCSI over FireWire (Requires scbus and da)
#device fwe # Ethernet over FireWire (non-standard!)
---
This causes kernel panic much later in the boot at umass and gives a db> prompt
that is functional but call dumpsys returns only 0 and there is no still no core dump.
I also tried booting FreeBSD 10.0 to see how the (new) usb stack
fares. It boots fine. I attach the verbose dmesg for both FBSD 10 and
DFly 3.6.2 just in case.
I wonder if there's a way to get a debug or dumpsys for 3.8? Could
vkernel help or is it isolated from hardware?
Thanks
Peeter
--
Files
Updated by peeter over 10 years ago
All of the following is about the new usb, u4b.
Commented out all firewire in the kernel config file and set
hw.usb.ohci.debug=15
The last lines before freeze are:
---
ohci0.pci0.pcib0.legacy0.nexus0.root0
ohci0: <nVidia nForce MCP79 USB Controller> [tentative] mem 0xd3488000-0xd3488fff irq 11 at device 4.0 pci0
ohci0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd3488000
ohci_init: start
ohci_controller_init: SMM active, request owner change
---FREEZE---
That last output is a DPRINTF at line 179 in sys/bus/u4b/controller/ohci.c. I put some kprintf-s after that to see where how far it gets. The output then is:
---
ohci0.pci0.pcib0.legacy0.nexus0.root0
ohci0: <nVidia nForce MCP79 USB Controller> [tentative] mem 0xd3488000-0xd3488fff irq 11 at device 4.0 pci0
ohci0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd3488000
ohci_init: start
ohci_controller_init: SMM active, request owner change
0:1:
---FREEZE---
The '1:' is a kprintf("1:"); after OWRITE4 at line 180. It never reaches the kprintfs I put in the 'for' cycle that comes next.
Updated by profmakx over 10 years ago
Hi Peeter,
Can you paste exactly the code where you put the kprintfs? I am not quite clear on where exactly the hang supposedly occurs.
Markus
Updated by peeter over 10 years ago
Hi Markus
Thanks. The code is here:
--- sys/bus/u4b/controller/ohci.c ---
static usb_error_t
ohci_controller_init(ohci_softc_t *sc, int do_suspend)
{
struct usb_page_search buf_res;
uint32_t i;
uint32_t ctl;
uint32_t ival;
uint32_t hcr;
uint32_t fm;
uint32_t per;
uint32_t desca;
/* Determine in what context we are running. /
ctl = OREAD4(sc, OHCI_CONTROL);
if (ctl & OHCI_IR) {
/ SMM active, request change */
DPRINTF("SMM active, request owner change\n");
kprintf("0:"); //debug
OWRITE4(sc, OHCI_COMMAND_STATUS, OHCI_OCR);
kprintf("1:"); //debug
for (i = 0; (i < 100) && (ctl & OHCI_IR); i++) {
kprintf("."); //debug
usb_pause_mtx(NULL, hz / 1000);
ctl = OREAD4(sc, OHCI_CONTROL);
}
kprintf("2:"); //debug
if (ctl & OHCI_IR) {
device_printf(sc->sc_bus.bdev,
"SMM does not respond, resetting\n");
OWRITE4(sc, OHCI_CONTROL, OHCI_HCFS_RESET);
goto reset;
}
} else {
DPRINTF("cold started\n");
--- ---
Peeter
--
Updated by profmakx over 10 years ago
Peeter,
if you have the time could you try a current -master or -DragonFly_RELEASE_3_8 and see whether it works now? Cheers.
Updated by peeter over 10 years ago
I tried DragonFly_RELEASE_3_8. Unfortunately, now it stops quite early at booting, almost the first page after reporting the CPU etc:
--- ---
cryptosoft0: <software crypto> on motherboard
aesni0: No AESNI support.
padlock0: No ACE support.
rdrand0: No RdRand support.
pcib0: <MTABLE Host-PCI bridge> pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
MPTABLE: fixed interrupt routing for 0:3 INTA
MPTABLE: fixed interrupt routing for 0:3 INTB
MPTABLE: fixed interrupt routing for 0:4 INTA
---FREEZE---
and the same with boot_verbose=YES:
--- ---
MPTABLE: fixed interrupt routing for 0:3 INTB
MPTABLE: 0:3 INTB routed to irq 14
IOAPIC: irq 14, gsi 14 edge/high > level/low> vendor=0x10de. . .
found
. . .
map10: type Memory, range 32, base 0xd3488000, size 12, enabled
MPTABLE: fixed interrupt routing for 0:4 INTA
MPTABLE: 0:4 INTA routed to irq 11
IOAPIC: irq 11, gsi 11 edge/high > level/low
unknown: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd3488000
ohci early: SMM active, request owner change
---FREEZE--
I commented out firewire in the kern config. Should I try with firewire?
Peeter
--
Updated by profmakx over 10 years ago
This is becoming weird. We're doing exactly the same thing that FreeBSD does. Do you use any differing options (ACPI on/off or anything)? It seems that requesting the ownership change from SMM to OS from OHCI hangs the system.
Updated by peeter over 10 years ago
Yes, I have ACPI off since DFly does not boot with ACPI: DFly 3.6, 3.8, master all freeze at boot immediately with ACPI on. When ACPI is on, the message at boot before freezing is:
--- ---
ACPI: APIC . . .
ACPI: APIC . . .
ACPI: MCFG . . .
ACPI: ASF! . . .
ACPI: SBST . . .
ACPI: ECDT . . .
ACPI: SSDT . . .
ACPI: SSDT . . .
ACPI: SSDT . . .
cryptosoft0: <software crypto> on motherboard
aesni0: No AESNI support
padlock0: No ACE support
rdrand0: No RdRand support
acpi0: <APPLE Apple00> on motherboard
ACPI: All ACPI Tables successfully acquired
---FREEZE---
FBSD 10.0 does boot with ACPI on and all is fine. However, now that you ask I turned ACPI off at FBSD 10.0 booting and it panics immediately at boot:
---FBSD 10.0 with ACPI off---
SMAP type. . .
SMAP type. . .
APIC: Could not find any APICs.
panic: running without device atpic requires a local APIC
cpuid = 0
KDB: stack backtrace:
. . .
Uptime: 1s
---FREEZE---
I also tried DFly master just as a quick consistency check that I did not miss anything while building kernel but its behavior is identical to DFly_RELEASE_3_8.
So all in all: in order for FBSD 10.0 to boot it must have ACPI on. But DFly freezes with ACPI on even before getting to pci or usb. DFly with ACPI off freezes as described above.
Questions regarding DFly ACPI would have been next on my list.
Let me know if I can send any more debug info or do anything to help further.
Peeter
--
Updated by swildner over 10 years ago
Regarding ACPI:
- Does booting verbosely with ACPI enabled give further messages?
- Does the box have the latest BIOS?
- Can you boot the box without ACPI and put up the output of acpidump -b somewhere for further inspection? Do it in an empty directory, as it will output a number of .dat files. I think it should work without acpi.ko loaded.
We really should look at the ACPI issues now because ACPI is actually mandatory on x86_64.
S.
Updated by peeter over 10 years ago
A few quick clarifications:
- No unfortunately there are no further messages. This was the verbose one that I got with "set boot_verbose=YES" and "set verbose=YES". Is there anything more that I could set specifically to make ACPI more verbose, like there is "set hw.usb.ohci.debug=15" for ohci?
- It is a macbookpro5,5, so it has an Apple EFI, not BIOS.
- I will get the ACPI dump from DFly. I can also get the dumps from FBSD if it is of any help and/or if DFly does not give acpidump without acpi.ko loaded.
Peeter
--
Updated by peeter over 10 years ago
- File dfly-3.6-x86_64-macbookpro5,5-acpidump-b.tgz dfly-3.6-x86_64-macbookpro5,5-acpidump-b.tgz added
The "acpidump -b" is attached. Does it look like one you expected? Regarding your question re the EFI's version, no I haven't updated it. It's the original one.
Peeter
--
Updated by swildner over 10 years ago
The log you gave in http://bugs.dragonflybsd.org/issues/2686#note-8 which ends with
rdrand0: No RdRand support
acpi0: <APPLE Apple00> on motherboard
ACPI: All ACPI Tables successfully acquired
does not look verbose. Can you try with 'v' from the loader menu or (to enable it permanently) putting boot_verbose=yes in /boot/loader.conf?
Sascha
Updated by peeter over 10 years ago
OK sorry, here are excerpts from booting master. (I will try to get a video shot in a couple of hours and report back.)
--- ---
SMAP. . .
SMAP. . .
Copyright. . .
Slab ZoneSize set to 128KB
Leaving critical section, allowing interrupts
ACPI SDT: RSDP not in EBDA
ACPI SDT: RSDP in BIOS mem
ACPI FADT: SCI irq 9, conform/conform
DragonFly v.3.9.0.152. . .
Calibrating clock(s) . . .
. . .
avail memory = . . .
ACPI MADT: LAPIC address 0xfee00000, flags 0x1
ACPI MADT: BSP apic id 0
ACPI MADT: cpu id 0, apic id 0
ACPI MADT: cpu id 1, apic id 1
lapic: divisor index 0, frequency 132667255 Hz
SMP: CPU0 apic_initialize():
lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
. . . not sure about this---went by too fast . . .
crypto: cryptosoft0 registers alg 14 flags 0 maxoplen 0
. . . lots of crypto here . . .
cryptosoft0: <software crypto> [attached!] on motherboard
aesni0: No AESNI support.
padlock0: No ACE support.
rdrand0: No RdRand support.
acpi0.nexus0.root0
acpi0: <APPLE Apple00> [tentative] on motherboard
ACPI: All ACPI Tables successfully acquired
PCIe: Memory Mapped configuration base @ 0xf0000000, bus [0, 255]
PCIe: Using Memory Mapped configuration
---FREEZE---
Peeter
--
Updated by swildner over 10 years ago
Can you try putting:
hw.pci.mcfg=0
in /boot/loader.conf and booting with ACPI enabled?
Updated by peeter over 10 years ago
Then it shows the same messages as a couple of posts above and freezes (verbose is on):
--- ---
ACPI: APIC . . .
ACPI: APIC . . .
ACPI: MCFG . . .
ACPI: ASF! . . .
ACPI: SBST . . .
ACPI: ECDT . . .
ACPI: SSDT . . .
ACPI: SSDT . . .
ACPI: SSDT . . .
cryptosoft0: <software crypto> on motherboard
aesni0: No AESNI support
padlock0: No ACE support
rdrand0: No RdRand support
acpi0: <APPLE Apple00> on motherboard
ACPI: All ACPI Tables successfully acquired
---FREEZE---
Updated by peeter over 10 years ago
As far as I can verify the ACPI: lines incl the addresses are identical to the ones contained in the FBSD dmesg above in the first post:
DFly:
--- ---
ACPI: APIC 0x00000000BFEEA000 000068 (v01 APPLE Apple00 00000001 Loki 0000005F)
. . .
FBSD:
--- ---
ACPI: APIC 0xbfeea000 00068 (v01 APPLE Apple00 00000001 Loki 0000005F)
ACPI: APIC 0xbfee9000 00068 (v02 APPLE Apple00 00000001 Loki 0000005F)
ACPI: MCFG 0xbfee8000 0003C (v01 APPLE Apple00 00000001 Loki 0000005F)
ACPI: ASF! 0xbfee7000 000A5 (v32 APPLE Apple00 00000001 Loki 0000005F)
ACPI: SBST 0xbfee6000 00030 (v01 APPLE Apple00 00000001 Loki 0000005F)
ACPI: ECDT 0xbfee5000 00053 (v01 APPLE Apple00 00000001 Loki 0000005F)
ACPI: SSDT 0xbfec7000 004DC (v01 APPLE CpuPm 00003000 INTL 20061109)
ACPI: SSDT 0xbfecc000 000A5 (v01 SataRe SataPri 00001000 INTL 20061109)
ACPI: SSDT 0xbfecb000 0009F (v01 SataRe SataSec 00001000 INTL 20061109)
--- ---
Updated by profmakx over 10 years ago
digging around in the big dump of things turns up things like these:
http://svnweb.freebsd.org/base?view=revision&revision=196769
http://lists.freebsd.org/pipermail/svn-src-head/2010-November/022705.html
This might be something most quickly solved by prodding matt. I will first need to understand whats going on in that part of the system.
Updated by profmakx over 10 years ago
peeter,
can you try the patch http://leaf.dragonflybsd.org/~profmakx/0001-Change-PAT-MSR-to-solve-freezes-with-MacBooks.patch
please? Boot with this patch applied to -master and ACPI enabled.
Cheers,
Markus
Updated by peeter over 10 years ago
Hi Markus
It does not hang on ACPI any more---it's a massive progress! It does freeze though, but much later in the booting process, at the same place we started with. It gives a db prompt but unfortunately no call dumpsys since the machine freezes.
---NO verbose---
pci4: <ACPI PCI bus> on pcib4
pcib4: No _PTR found, routing with pci
fwohci0: vendor=11c1, dev=5901
fwohci0: <1394 Open Host Controller Interface> mem 0xd3100000-0xd3100fff irq 19 at device 0.0 on pci4
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic->id = 00000000
fault virtual address = 0x38
fault code = . . .
. . .
CPU0 stopping CPUs: 0x00000002
stopped
Stopped at taskqueue_thread_loop+0x2f: lock addl $0x1,0x38(%rbx)
db>
---FREEZE---
I can't call dumpsys, the system is frozen.
With boot_verbose=YES and "options USB_DEBUG" in kernel I see this last message:
---boot_verbose=YES---
fwohci0: <1394 Open Host Controller Interface> [tentative] mem 0xd3100000-0xd3100fff irq 19 at device 0.0 on pci4
fwohci0: latency timer 0 -> 32.
fwohci0: cache size 64 -> 64.
fwohci0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd31000000
Fatal trap 12: page fault while in kernel mode
. . .
db>
---FREEZE---
Can't get dumpsys.
I can try and compile without firewire as above?
Peeter
--
Updated by profmakx over 10 years ago
It's worth a try. I'll have a sharp look at my patch again to make sure I didn't mess up constants.
Does the kernel print a backtrace, or at least an address at which the panic happens?
Updated by peeter over 10 years ago
Unfortunately no backtrace, the complete message is always this:
The complete panic msg is always this:
--- ---
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic->id = 00000000
fault virtual address = 0x38
fault code = supervisor write data, page not present
instruction pointer = 0x8:0xffffffff806197fc
stack pointer = 0x10:0xffffffe0fbc2ba90
frame pointer = 0x10:0xffffffe0fbc2bab0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
current thread = pri 12 (CRIT)
kernel: type 12 trap, code=2
CPU0 stopping CPUs: 0x00000002
stopped
Stopped at taskqueue_thread_loop+0x2f: lock addl $0x1,0x38(%rbx)
db>
--- ---
The new thing in boot messages were the above lines (in boot_verbose=NO):
---
pci4: <ACPI PCI bus> on pcib4
pcib4: No _PTR found, routing with pci
---
I had not seen this before.
I'll try and compile w/o firewire.
Peeter
--
Updated by peeter over 10 years ago
Without firewire support in the kernel, the picture looks a little different. Now the messages before freeze are:
---boot_verbose=NO---
cpu_cst1: <ACPI CPU C-state> on cpu1
cpu_pst1: <ACPI CPU C-state> on cpu1
clockmod1: <CPU clock modulation> on cpu1
coretemp1: <CPU On-Die Thermal Sensors> on cpu1
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic->id = 00000000
fault virtual address = 0x38
fault code = supervisor write data, page not present
instruction pointer = 0x8:0xffffffff806086cc
stack pointer = 0x10:0xffffffe0fbc2ba90
frame pointer = 0x10:0xffffffe0fbc2bab0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
current thread = pri 12 (CRIT)
kernel: type 12 trap, code=2
CPU0 stopping CPUs: 0x00000002
stopped
Stopped at taskqueue_thread_loop+0x2f: lock addl $0x1,0x38(%rbx)
db>
---FREEZE---
Unfortunately no dumpsys since all is frozen. The verbose one is:
--- ---
clockmod1: <CPU clock modulation> [tentative] on cpu1
clockmod1: <CPU clock modulation> [attached!] on cpu1
coretemp1.cpu1.acpi0.nexus0.root0
coretemp1: <CPU On-Die Thermal Sensors> [tentative] on cpu1
coretemp1: Setting TjMax=100
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic->id = 00000000
fault virtual address = 0x38
fault code = supervisor write data, page not present
instruction pointer = 0x8:0xffffffff806086cc
stack pointer = 0x10:0xffffffe0fbc2ba90
frame pointer = 0x10:0xffffffe0fbc2bab0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
current thread = pri 12 (CRIT)
kernel: type 12 trap, code=2
CPU0 stopping CPUs: 0x00000002
stopped
Stopped at taskqueue_thread_loop+0x2f: lock addl $0x1,0x38(%rbx)
db>
---FREEZE---
Peeter
--
Updated by profmakx over 10 years ago
Hi Peeter,
can you try the patch at (with the previous one reverted)?
http://leaf.dragonflybsd.org/~profmakx/0002-Change-PAT-MSR-to-solve-freezes-with-MacBooks.patch
I can't see how it should make any difference, but since I can also not test it's a little bit of stab-and-prod now.
Updated by peeter over 10 years ago
Hi Markus
Thanks, are you sure this is the patch you meant? Diff shows the first and the second patch are identical. . .
Peeter
--
Updated by profmakx over 10 years ago
Hi Peeter,
yeah, I am sure. Sorry I overwrote the old patch and it's just a suble difference.
Updated by peeter over 10 years ago
Yep sry :), just realized this. Am compiling. . .
P
--
Updated by peeter over 10 years ago
Unfortunately it freezes at the same location with similar messages (kernel is WITH firewire):
--- patch-0002 --- boot_verbose=NO ---
pci4: <ACPI PCI bus> on pcib4
pcib4: No _PTR found, routing with pci
fwohci0: vendor=11c1, dev=5901
fwohci0: <1394 Open Host Controller Interface> mem 0xd3100000-0xd3100fff irq 19 at device 0.0 on pci4
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic->id = 00000000
fault virtual address = 0x38
fault code = supervisor write data, page not present
instruction pointer = 0x8:0xffffffff806199ec
stack pointer = 0x10:0xffffffe0fbc1aa90
frame pointer = 0x10:0xffffffe0fbc1aab0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
current thread = pri 12 (CRIT)
kernel: type 12 trap, code=2
CPU0 stopping CPUs: 0x00000002
stopped
Stopped at +0x2f: lock addl $0x1,0x38(%rbx)
db>
---FREEZE---
and the verbose:
--- patch-0002 --- boot_verbose=YES ---
fwohci0: <1394 Open Host Controller Interface> [tentative] mem 0xd3100000-0xd3100fff irq 19 at device 0.0 on pci4
fwohci0: latency timer 0 -> 32.
fwohci0: cache size 64 -> 64.
fwohci0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd31000000
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic->id = 00000000
fault virtual address = 0x38
fault code = supervisor write data, page not present
instruction pointer = 0x8:0xffffffff806199ec
stack pointer = 0x10:0xffffffe0fbc1aa90
frame pointer = 0x10:0xffffffe0fbc1aab0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
current thread = pri 12 (CRIT)
kernel: type 12 trap, code=2
CPU0 stopping CPUs: 0x00000002
stopped
Stopped at +0x2f: lock addl $0x1,0x38(%rbx)
db>
---FREEZE---
Can't get dumpsys since kbd frozen.
Peeter
--
Updated by profmakx over 10 years ago
grml.
Any chance you can upload your kernel or at least the kernel config so i can dig around what's going wrong at that panic?
Updated by peeter over 10 years ago
Thanks! Patched kernels (with respectively the first and the second patch, dfly-3.9-x86_64):
https://dl.dropboxusercontent.com/u/27306304/kernel-patch/kernel-patch-0001.tgz
https://dl.dropboxusercontent.com/u/27306304/kernel-patch/kernel-patch-0002.tgz
kernconf is X86_64_GENERIC with "options USB_DEBUG"
https://dl.dropboxusercontent.com/u/27306304/kernel-patch/X86_64_GENERIC-NU
Let me know if you have problems accessing these.
Peeter
--
Updated by profmakx over 10 years ago
Peeter,
There have been a couple of commits in the taskqueue code recently in -master. Could you try current master + my patch? Are you using the built-in keyboard of the macbook? Do you know whether that's connected by USB?
the fact that it doesn't dump is probably due to it not having setup a dumpdev yet.
Updated by peeter over 10 years ago
Thanks, pulled the latest master and tried both patch-0001 and patch-0002 but unfortunately everything remains the same: freeze, panic, no dumpsys and the same panic message.
You're right about the kbd being a USB device (https://discussions.apple.com/message/24166876, second hit when searching for "keyboard"). I use the built-in kbd yes.
I tried "set dumpdev=disk0s1b:" at the loader prompt but no dumpsys.
This is probably of no relevance, but found
http://www.opensource.apple.com/source/xnu/xnu-1699.32.7/osfmk/i386/mtrr.c
and there are a few hits when you search PAT and setting the WC bit; I have no idea what's going on though.
Peeter
--
Updated by profmakx over 10 years ago
Hi Peeter,
this gives me a hint where to look, cheers.
Updated by peeter over 9 years ago
- File 0001-Program-PAT-indices-5-as-WP-from-default-WT-and-6-as.patch 0001-Program-PAT-indices-5-as-WP-from-default-WT-and-6-as.patch added
Hello
I've come back to this issue---and while I don't have a solution yet it seems you were on the right track Markus. It seems there are several issue involved. The patch you gave above is part of the solution. The discussion in the FreeBSD list
discussion:
http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021485.html
solution:
http://lists.freebsd.org/pipermail/svn-src-head/2010-November/022705.html
shows that many macbookpro models had an issue with PAT and changing from bits (4, 5) to bits (5, 6) resolved that.
I improved Markus' patch a little (attached) bit since you also have to change the masks for the MSR register accordingly [1]. With this patch the system gets ACPI working in
macbookpro5,1
macbookpro5,5
iMac9,1
as confirmed in the FreeBSD list and macbookpro5,5 I can confirm myself. Since FreeBSD boots on other systems the patch does not affect systems that work either way.
Question: the original FreeBSD commit
http://lists.freebsd.org/pipermail/svn-src-head/2010-November/022705.html
also changes the pmap_cache_bits() routine; DragonFly pmap.c does not have this routine but it should be checked if there might be similar routines that need adjusting.
The next step: I suspect there is something off with the acpi--pci link since the error before crash is that _PRT is not found. I'll dig further.
Peeter
[1] pages 11-34 to 11-35, Vol. 3A in: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
--
Updated by profmakx over 9 years ago
Oh my god, I am so sorry. I did exactly what your patch does, but apparently that didn't end up in the patch I posted. I'll apply this, and test it. Unfortunately it'll be too late for 4.2, but 4.2.1 will then contain this fix (as well as -master of course).
Thanks again.
Updated by profmakx over 9 years ago
Can you specify what you mean by "the crash"?
Updated by peeter over 9 years ago
Thanks! I mean the kernel still panics later in the boot process (see attached jpg) and gives the db> prompt. But since the keyboard is hung/frozen, I can't call dumpsys.
There's an error message at the top "pcib4: No _PRT found, routing with pci". This should not happen normally. I'm wondering if tweaking acpi--pci will get rid of this message and of the trap. When FreeBSD is booting, it handles all the pci devices without that error message; DragonFly should do the same.
Updated by peeter over 9 years ago
And here's the verbose boot---it's less helpful though since the "No _PRT found" error is out of the screen. I am working with the assumption that the latter is the culprit.
Ideally one would get the keyboard working but I have no idea how the keyboard is driven before the devices have been probed. I booted with "boot_ddb=1", the system drops to db prompt but the keyboard is hung/frozen. I think the internal keyboard is a usb device.
Updated by peeter over 9 years ago
It would be helpful if one could get a dumpsys. Since the keyboard is hung/frozen, I tried to "fake" a call dumpsys by brute force: I hardcoded in the
sys/ddb/db_lex.c: db_read_line()
that it returns "call dumpsys" to
sys/ddb/db_input.c: db_readline()
And so indeed it went to
sys/kern/kern_shutdown.c: dumpsys()
but alas: in that routine, the if condition at line 913
dumper.dumper != NULL
failed and no dumpsys. I do not know how to correctly initialize the dumper struct so dumpsys is executed. Any ideas how to do that?
Figuring out why the keyboard does not work might be harder since with FreeBSD keyboard hangs too when the kernel drops to ddb immediately after booting.
Updated by peeter over 7 years ago
- File fbsd-11.dmesg added
For what it's worth, here's the FreeBSD 11 verbose boot.
Updated by peeter over 7 years ago
- File fbsd-11.dmesg fbsd-11.dmesg added
The last attachment did not come through. Try again.
Updated by peeter over 7 years ago
DragonFly boots on macbookpro5,5 as of these commits
https://gitweb.dragonflybsd.org/dragonfly.git/commit/f6520cce7cc497d0d35ef0fc771daf741d9134cf
https://gitweb.dragonflybsd.org/dragonfly.git/commit/704f404a0d28f2071a8c86686d90388c6ad28a26
given the kernel is compiled without bwn and siba_bwn. After boot, one can load siba_bwn and bwn without crashing the kernel; they do not attach though.
It is still open why the two drivers crash the kernel at boot but not afterwards, but this can be investigated in another PR.
Updated by daftaupe about 5 years ago
I'm adding a note as a reminder, because this ticket helped me boot and install DragonFly 5.7 on my Macbook 4,1.
I had to compile kernel without bwn and siba_bwn and remove /boot/kernel/if_bwn.ko and /boot/kernel/siba_bwn.ko from the install media.
After that I could install DragonFly without any issue, I haven't tried loading the modules later as I'm not using the wifi on this laptop.