Bug #3345
openamdgpu crashes when loaded on Ryzen 5 PRO 3500U
0%
Description
Hi,
I tried to load amdgpu on a HP laptop with a Ryzen 5 PRO 3500U.
Linux reports it as PCI_ID=1002:15D8 which seems to be supported according to https://gitweb.dragonflybsd.org/dragonfly.git/blob/HEAD:/sys/dev/drm/amd/amdgpu/amdgpu_drv.c#l788
Though, after installing devfw-amdgpu and trying to load amdgpu I get the following trace and the system ends up on a black screen.
[drm] amdgpu kernel modesetting enabled.
drm0 on vgapci0
[drm] pdev: vendor=0x1002 device=0x15d8 rev=0xd2
[drm] svendor=0x103c sdevice=0x8589 irq=17
vgapci0: child drm0 requested pci_enable_io
vgapci0: child drm0 requested pci_enable_io
amdgpu_driver_load_kms(): flags=131094 drm_device=0xfffff801564f4500 adev=0xfffff80364ac8000
amdgpu_device_init: start
[drm] initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x103C:0x8589 0xD2).
amdgpu_device_init: 1
[drm] register mmio base: 0xF0600000
[drm] register mmio size: 524288
pci_resource_flags: pdev=0xfffff8015646e800 bar=2 type=MEM
amdgpu_device_init: 2
amdgpu_device_init: for loop 0
amdgpu_device_init: for loop 1
amdgpu_device_init: for loop 2
amdgpu_device_init: for loop 3
amdgpu_device_init: for loop 4
amdgpu_device_init: rio_rid=32
amdgpu_device_init: mem_size=256
amdgpu_device_init: 3
[drm] add ip block number 0 <soc15_common>
[drm] add ip block number 1 <gmc_v9_0>
[drm] add ip block number 2 <vega10_ih>
[drm] add ip block number 3 <psp>
[drm] add ip block number 4 <powerplay>
[drm] add ip block number 5 <dm>
[drm] add ip block number 6 <gfx_v9_0>
[drm] add ip block number 7 <sdma_v4_0>
[drm] add ip block number 8 <vcn_v1_0>
[drm] VCN decode is enabled in VM mode
[drm] VCN encode is enabled in VM mode
[drm] VCN jpeg decode is enabled in VM mode
amdgpu_device_init: 4
amdgpu_device_init: 5
ATOM BIOS: SWBRT48929.001
amdgpu_device_init: 6
amdgpu_device_init: 6.1
amdgpu_device_init: 7
amdgpu_device_init: 8
[drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
amdgpu: No suitable DMA available.
amdgpu: No coherent DMA available.
drm0: info: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
drm0: info: GART: 1024M 0x000000F500000000 - 0x000000F53FFFFFFF
[drm] Detected VRAM RAM=512M, BAR=256M
[drm] RAM width 128bits DDR4
[TTM] Zone kernel: Available graphics memory: 65536 kiB
[TTM] Zone dma32: Available graphics memory: 65536 kiB
[TTM] Initializing pool allocator
[drm] amdgpu: 512M of VRAM memory ready
[drm] amdgpu: 3072M of GTT memory ready.
[drm] GART: num cpu pages 262144, num gpu pages 262144
[drm] PCIE GART of 1024M enabled (table at 0x000000F404000000).
ttm_check_under_lowerlimit: stub
amdgpu: [powerplay] powerplay sw init successfully
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 0 use gpu addr 0x000000f500400040, cpu addr 0x0xffffb80000101040
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 1 use gpu addr 0x000000f5004000c0, cpu addr 0x0xffffb800001010c0
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 2 use gpu addr 0x000000f500400140, cpu addr 0x0xffffb80000101140
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 3 use gpu addr 0x000000f5004001c0, cpu addr 0x0xffffb800001011c0
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 4 use gpu addr 0x000000f500400240, cpu addr 0x0xffffb80000101240
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 5 use gpu addr 0x000000f5004002c0, cpu addr 0x0xffffb800001012c0
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 6 use gpu addr 0x000000f500400340, cpu addr 0x0xffffb80000101340
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 7 use gpu addr 0x000000f5004003c0, cpu addr 0x0xffffb800001013c0
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 8 use gpu addr 0x000000f500400440, cpu addr 0x0xffffb80000101440
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 9 use gpu addr 0x000000f5004004e0, cpu addr 0x0xffffb800001014e0
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
[drm] use_doorbell being set to: [true]
drm0: debug: fence driver on ring 10 use gpu addr 0x000000f500400560, cpu addr 0x0xffffb80000101560
ttm_check_under_lowerlimit: stub
[drm] Found VCN firmware Version: 1.73 Family ID: 18
[drm] PSP loading VCN firmware
drm0: debug: fence driver on ring 11 use gpu addr 0x000000f5004005e0, cpu addr 0x0xffffb800001015e0
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 12 use gpu addr 0x000000f500400660, cpu addr 0x0xffffb80000101660
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 13 use gpu addr 0x000000f5004006e0, cpu addr 0x0xffffb800001016e0
ttm_check_under_lowerlimit: stub
drm0: debug: fence driver on ring 14 use gpu addr 0x000000f500400760, cpu addr 0x0xffffb80000101760
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
ttm_check_under_lowerlimit: stub
amdgpu: [powerplay] dpm has been enabled
[drm] DM_PPLIB: values for Invalid clock
[drm] DM_PPLIB: 400000 in kHz
[drm] DM_PPLIB: 933000 in kHz
[drm] DM_PPLIB: 1067000 in kHz
[drm] DM_PPLIB: 1200000 in kHz
[drm] DM_PPLIB: values for Invalid clock
[drm] DM_PPLIB: 300000 in kHz
[drm] DM_PPLIB: 600000 in kHz
[drm] DM_PPLIB: 626000 in kHz
[drm] DM_PPLIB: 654000 in kHz
[drm] Display Core initialized with v3.1.59!
tunable drm.video.eDP-1 is not set
[drm] SADs count is: -2, don't need to read it
tunable drm.video.DP-1 is not set
tunable drm.video.DP-2 is not set
tunable drm.video.DP-3 is not set
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
drm0: error: [gfxhub] VMC page fault (src_id:0 ring:217 vmid:0 pasid:0, for process pid 0 thread pid 0
)
drm0: error: at address 0x0000000000000000 from 27
drm0: error: VM_L2_PROTECTION_FAULT_STATUS:0x00000BB2
drm0: error: [gfxhub] VMC page fault (src_id:0 ring:217 vmid:0 pasid:0, for process pid 0 thread pid 0
)
drm0: error: at address 0x0000000000001000 from 27
drm0: error: VM_L2_PROTECTION_FAULT_STATUS:0x00000BB2
error: [drm:pid759:gfx_v9_0_kiq_kcq_enable] *ERROR* KCQ enable failed (scratch(0xC040)=0xCAFEDEAD)
error: [drm:pid759:amdgpu_device_ip_init] *ERROR* hw_init of IP block <gfx_v9_0> failed -22
drm0: error: amdgpu_device_ip_init failed
drm0: error: Fatal error during GPU init
[drm] amdgpu: finishing device.
amdgpu_device_ip_fini: 1
The crash is consistently reproduced and I discovered it's in fact a db prompt, so I got a dump which I have uploaded in my leaf account : /build/home/daftaupe/crash/amdgpu_crash together with that trace, dmesg output and pciconf output.
If anything more is needed, just ask.
Updated by daftaupe 12 months ago
- Status changed from New to In Progress
The card works with the branch of hjarvard actually available on https://github.com/servizig/DragonFlyBSD/tree/drm_amdgpu_v4.20_v1 and the firmwares in https://leaf.dragonflybsd.org/~szi/firmware/devfw-amdgpu-20231211.tar.xz being unarchived in /boot/modules.local.