Project

General

Profile

Bug #3136

Xorg freezes on radeon: Xorg and kernel (ttm swap) in "lwe"

Added by peeter 4 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Kernel
Target version:
-
Start date:
06/07/2018
Due date:
% Done:

0%

Estimated time:

Description

I've updated the CPU on my desktop from Ryzen 2400G to Ryzen 2600 (gfx card remains Radeon R7 360). Updated master too to the latest and am getting freezes which seem to be related to radeonkms. Xorg freezes but I can still ssh to the machine which runs fine. Tried to find the locked threads. Not sure I succeeded but here's what I found: an Xorg thread and kernel's (ttm swap) are in wchan "lwe", both at radeon_fence_wait_seq_timeout().

Here's kgdb output:

* thread 1871: kernel (ttm swap)
---
(kgdb) thread 1871
[Switching to thread 1871 (kernel ttm swap)]
#0 0xffffffff80638569 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:810
810 lwkt_switch_return(td->td_switch(ntd));

(kgdb) back
#0 0xffffffff80638569 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:810
#1 0xffffffff80642788 in tsleep (ident=0xfffff80ac8417800, flags=1024, wmesg=<optimized out>, timo=2147483647)
at /usr/src/sys/kern/kern_synch.c:716
#2 0xffffffff80642e2c in lksleep (ident=ident@entry=0xfffff80ac8417800, lock=lock@entry=0xfffff80ac8417800,
flags=flags@entry=0, wmesg=wmesg@entry=0xffffffff830fb7ba "lwe", timo=timo@entry=2147483647)
at /usr/src/sys/kern/kern_synch.c:827
#3 0xffffffff8301ee85 in radeon_fence_wait_seq_timeout (rdev=0xfffff80ac8416400,
target_seq=target_seq@entry=0xfffff80ac846b8b0, intr=<optimized out>, timeout=2147483647)
at /usr/src/sys/dev/drm/radeon/radeon_fence.c:400
#4 0xffffffff8301efcd in radeon_fence_wait (fence=<optimized out>, intr=<optimized out>)
at /usr/src/sys/dev/drm/radeon/radeon_fence.c:443
#5 0xffffffff83e0e463 in ttm_bo_cleanup_refs_and_unlock (bo=bo@entry=0xfffff80ae6264718,
interruptible=interruptible@entry=false, no_wait_gpu=no_wait_gpu@entry=false)
at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:656
#6 0xffffffff83e0fc00 in ttm_bo_swapout (shrink=0xfffff8068f00bf28) at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:1886
#7 0xffffffff83e10883 in ttm_shrink (glob=0xfffff80ac51b9208, from_wq=true, extra=0)
at /usr/src/sys/dev/drm/drm/../ttm/ttm_memory.c:174
#8 0xffffffff80660639 in taskqueue_run (queue=0xfffff8068f469660, lock_held=1) at /usr/src/sys/kern/subr_taskqueue.c:338
#9 0xffffffff806607cd in taskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:498
#10 0xffffffff80635550 in ?? () at /usr/src/sys/kern/lwkt_thread.c:1748
#11 0x0000000000000000 in ?? ()
---

* 445 Xorg pid 741/1
---
(kgdb) thread 445
[Switching to thread 445 (pid 741/1, Xorg)]
#0 0xffffffff80638569 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:810
810 lwkt_switch_return(td->td_switch(ntd));

(kgdb) back
#0 0xffffffff80638569 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:810
#1 0xffffffff806423cd in tsleep (ident=0xfffff80ac8417800, flags=1024, wmesg=<optimized out>, timo=2147483647)
at /usr/src/sys/kern/kern_synch.c:703
#2 0xffffffff80642e2c in lksleep (ident=ident@entry=0xfffff80ac8417800, lock=lock@entry=0xfffff80ac8417800,
flags=flags@entry=0, wmesg=wmesg@entry=0xffffffff830fb7ba "lwe", timo=timo@entry=2147483647)
at /usr/src/sys/kern/kern_synch.c:827
#3 0xffffffff8301ee85 in radeon_fence_wait_seq_timeout (rdev=0xfffff80ac8416400,
target_seq=target_seq@entry=0xfffff80acfa1f118, intr=<optimized out>, timeout=2147483647)
at /usr/src/sys/dev/drm/radeon/radeon_fence.c:400
#4 0xffffffff8301efcd in radeon_fence_wait (fence=<optimized out>, intr=<optimized out>)
at /usr/src/sys/dev/drm/radeon/radeon_fence.c:443
#5 0xffffffff83e0e463 in ttm_bo_cleanup_refs_and_unlock (bo=bo@entry=0xfffff80ae6264718,
interruptible=interruptible@entry=false, no_wait_gpu=no_wait_gpu@entry=false)
at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:656
#6 0xffffffff83e0fc00 in ttm_bo_swapout (shrink=0xfffff8068f00bf28) at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:1886
#7 0xffffffff83e10883 in ttm_shrink (glob=glob@entry=0xfffff80ac51b9208, from_wq=from_wq@entry=false,
extra=extra@entry=5136) at /usr/src/sys/dev/drm/drm/../ttm/ttm_memory.c:174
#8 0xffffffff83e10f3c in ttm_mem_global_alloc_zone (interruptible=false, no_wait=<optimized out>, memory=4096,
single_zone=0x0, glob=0xfffff80ac51b9208) at /usr/src/sys/dev/drm/drm/../ttm/ttm_memory.c:416
#9 ttm_mem_global_alloc_page (glob=glob@entry=0xfffff80ac51b9208, vm_page=<optimized out>, no_wait=no_wait@entry=false,
interruptible=interruptible@entry=false) at /usr/src/sys/dev/drm/drm/../ttm/ttm_memory.c:449
#10 0xffffffff83e11f52 in ttm_pool_populate (ttm=ttm@entry=0xfffff806894a7f18)
at /usr/src/sys/dev/drm/drm/../ttm/ttm_page_alloc.c:853
#11 0xffffffff83031b91 in radeon_ttm_tt_populate (ttm=0xfffff806894a7f18) at /usr/src/sys/dev/drm/radeon/radeon_ttm.c:640
#12 0xffffffff83e0abe0 in ttm_tt_bind (ttm=0xfffff806894a7f18, bo_mem=bo_mem@entry=0xfffff80acfa1f478)
at /usr/src/sys/dev/drm/drm/../ttm/ttm_tt.c:263
#13 0xffffffff83e0c9df in ttm_bo_handle_move_mem (bo=bo@entry=0xfffff80ac5cbb318, mem=mem@entry=0xfffff80acfa1f478,
evict=evict@entry=false, interruptible=interruptible@entry=true, no_wait_gpu=no_wait_gpu@entry=false)
at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:463
#14 0xffffffff83e0f6ba in ttm_bo_move_buffer (no_wait_gpu=false, interruptible=true, placement=<optimized out>,
bo=0xfffff80ac5cbb318) at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:1177
#15 ttm_bo_validate (bo=0xfffff80ac5cbb318, placement=<optimized out>, interruptible=<optimized out>,
no_wait_gpu=<optimized out>) at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:1233
#16 0xffffffff83e0f9a6 in ttm_bo_init (bdev=bdev@entry=0xfffff80ac8416b68, bo=bo@entry=0xfffff80ac5cbb318,
size=size@entry=262144, type=type@entry=ttm_bo_type_device, placement=placement@entry=0xfffff80ac5cbb2f8,
page_alignment=page_alignment@entry=1, interruptible=true, persistent_swap_storage=0x0, acc_size=9344, sg=0x0,
destroy=0xffffffff8302b400 <radeon_ttm_bo_destroy>) at /usr/src/sys/dev/drm/drm/../ttm/ttm_bo.c:1340
#17 0xffffffff8302b87e in radeon_bo_create (rdev=rdev@entry=0xfffff80ac8416400, size=size@entry=262144,
byte_align=byte_align@entry=4096, kernel=kernel@entry=false, domain=domain@entry=2, flags=flags@entry=0, sg=0x0,
bo_ptr=0xfffff80acfa1f630) at /usr/src/sys/dev/drm/radeon/radeon_object.c:222
#18 0xffffffff8301fed8 in radeon_gem_object_create (rdev=rdev@entry=0xfffff80ac8416400, size=262144, alignment=4096,
initial_domain=2, flags=0, kernel=kernel@entry=false, obj=0xfffff80acfa1f6a0)
at /usr/src/sys/dev/drm/radeon/radeon_gem.c:75
#19 0xffffffff83020273 in radeon_gem_create_ioctl (dev=<optimized out>, data=0xfffff80acfa1f868, filp=0xfffff80ac6d53420)
at /usr/src/sys/dev/drm/radeon/radeon_gem.c:260
#20 0xffffffff83df4b53 in drm_ioctl (ap=<optimized out>) at /usr/src/sys/dev/drm/drm/../drm_ioctl.c:717
#21 0xffffffff805fa07c in dev_dioctl (dev=dev@entry=0xfffff8068f41dbc0, cmd=cmd@entry=3223348317,
data=data@entry=0xfffff80acfa1f868 "", fflag=<optimized out>, cred=cred@entry=0xfffff80acc2cae10,
msg=msg@entry=0xfffff80acfa1f960, fp=0xfffff80ac5bab400) at /usr/src/sys/kern/kern_device.c:245
#22 0xffffffff808abc63 in devfs_fo_ioctl (fp=0xfffff80ac5bab400, com=3223348317, data=0xfffff80acfa1f868 "",
ucred=0xfffff80acc2cae10, msg=0xfffff80acfa1f960) at /usr/src/sys/vfs/devfs/devfs_vnops.c:1545
#23 0xffffffff80663233 in fo_ioctl (msg=0xfffff80acfa1f960, cred=0xfffff80acc2cae10, data=0xfffff80acfa1f868 "",
com=3223348317, fp=0xfffff80ac5bab400) at /usr/src/sys/sys/file2.h:84
#24 mapped_ioctl (fd=9, com=<optimized out>, uspc_data=0x7fffffdfc9c0 <Address 0x7fffffdfc9c0 out of bounds>, map=0x0,
msg=0xfffff80acfa1f960) at /usr/src/sys/kern/sys_generic.c:717
#25 0xffffffff80b3a5d8 in syscall2 (frame=0xfffff80acfa1f9f8) at /usr/src/sys/platform/pc64/x86_64/trap.c:1251
#26 0xffffffff80b18eb0 in ?? () at /usr/src/sys/platform/pc64/x86_64/exception.S:434
#27 0x0000000000000009 in ?? ()
#28 0x00000000c020645d in ?? ()
#29 0x00007fffffdfc9c0 in ?? ()
---

History

#1 Updated by peeter 4 months ago

Just to add that the freeze seems to occur randomly; it may take 30 mins or 34 hours.

#2 Updated by peeter 4 months ago

I've got another backtrace. This time the buffer objects that the threads are working on are different.

#3 Updated by ftigeot 4 months ago

  • Category set to Kernel

The "lwe" state is caused by the wait_event_xxx() Linux macro family.
One of the places where its usage differs significantly with Linux is in drm_read()
This function should probably be synchronized with Linux.

#4 Updated by ftigeot 2 months ago

drm_read() has just been updated by commit d420a8a88d271f244e34680d0f9bf03c7bf2f5ed in master.
This is a bit of a shot in the dark but still worth a test.

#5 Updated by peeter 2 months ago

Thanks! --- built the new kernel. Let's see how it behaves.

Also available in: Atom PDF