Bug #2635

Stablize drm/radeon KMS drivers for new xorg

Added by tuxillo 6 months ago. Updated 17 days ago.

Status:ResolvedStart date:02/13/2014
Priority:NormalDue date:
Assignee:jorisgio% Done:

100%

Category:Driver
Target version:3.8.0

Description

Stablize drm/radeon KMS drivers for new xorg

patch-ttm_bo.c Magnifier (422 Bytes) vadaszi, 02/21/2014 03:59 PM

core.txt.8 (34.4 KB) jorisgio, 02/24/2014 09:57 AM

dmesg (5.19 KB) jorisgio, 02/24/2014 10:00 AM

ugly.diff Magnifier - really ugly hack (2.72 KB) yellowrabbit2010, 07/12/2014 08:21 PM

History

#1 Updated by vadaszi 5 months ago

This patch fixes the assertion failing during "kldload radeonkms" in sys/dev/drm/ttm/ttm_bo.c line 1169.

#2 Updated by jorisgio 5 months ago

Thanks. I believed this had already been fixed. I commited your patch ea6f52c4fb2da1957cbee60ff4daf8be83de1f8e.

I remember there where one or two of those. I think fixed have been commited, but i'm not sure anymore now. But after that, the major issue with ttm is an issue with the RB_TREE of buffers object. I get a panic where some of the elements in the tree have a null refcount and invalid data in them.

I'll upload core as soon as i can access the machine.

#3 Updated by jorisgio 5 months ago

Hi,

With kernel v3.7.1.747.g4fd11-DEVELOPMENT, I have a new panic. See core files :

http://leaf.dragonflybsd.org/~joris/vmcore.ttm
http://leaf.dragonflybsd.org/~joris/kern.ttm

The radeonkms module loads and initialize.

Laters xorg calls mmap to get a shared mapping of the card0 device.

* vm_mmap in vm/vm_mmap.c is called to obtain a shared mapping. This is a device mapping, type is VCHR

* enters dev_dmmap_single -> drm_mmap_single -> ttm_bo_mmap_single

* ttm_bo_mmap_single looks up the pair (offset/size) in the buffer object red/black tree. bdev address in the core is 0xffffffe0e89f73a8

* offset is 4299227136>>12, and size is 8192>>12

* The lookup fails. Here are some informations from the core. There are only two elements in the tree, the second being the right child of the first. First element is at address 0xffffffe0e64b85a0 and second element is at address 0xffffffe0e64b81c8.

* at the end of the while loop, the condition (best_bo->vm_node->start + best_bo->num_pages) < (page_start + num_pages) is false

* ttm_bo_mmap_single returns EINVAL

* vm_mmap only check error code against ENODEV. object is unitiliazed, and hence the panic.

Now the real question is "why does the lookup fail ?" That, i don't know.

#4 Updated by jorisgio 5 months ago

  • Assignee set to jorisgio

#5 Updated by yellowrabbit2010 26 days ago

Horray! With commits 7d32ff5365a2d04afa460e716477566727967980 and f6201ebf294c18bfb1c3d199bed847ab481b3099
my Radeon 6670 becomes alive! Great job.

Now I can use new xorg with kms driver. 2 monitors work nicely --- mplayer plays video on the big one, when I can do something simple on the little one.

But I cannot do something like "make -j 3 buildworld" --- after 5--25 minutes computer will restart.
Without dumps. Few strings remain in /var/log/messages:

=== like ===
Jul 5 12:38:26 fly kernel: panic: Bad tailq NEXT(0xffffffe04d02c610->tqh_last) != NULL
Jul 5 12:38:26 fly kernel: cpuid = 3
Jul 5 12:38:26 fly kernel: Trace beginning at frame 0xffffffe10b957428
Jul 5 12:38:26 fly kernel: panic() at panic+0x21f 0xffffffff804a12b0
Jul 5 12:38:26 fly kernel: panic() at panic+0x21f 0xffffffff804a12b0
Jul 5 12:38:26 fly kernel: callout_reset() at callout_reset+0xf2 0xffffffff804c3a30
Jul 5 12:38:26 fly kernel: taskqueue_enqueue_timeout() at taskqueue_enqueue_timeout+0x102 0xffffffff804d9
7ce
Jul 5 12:38:26 fly kernel: ttm_bo_release() at ttm_bo_release+0x276 0xffffffff84c66de8
Jul 5 12:38:26 fly kernel: ttm_bo_unref() at ttm_bo_unref+0x4b 0xffffffff84c6440b
Jul 5 12:38:26 fly kernel: radeon_bo_unref() at radeon_bo_unref+0x44 0xffffffff840b85be
Jul 5 12:38:26 fly kernel: radeon_gem_object_free() at radeon_gem_object_free+0x21 0xffffffff840ab1f3
Jul 5 12:38:26 fly kernel: drm_gem_object_free() at drm_gem_object_free+0x70 0xffffffff84c58805
Jul 5 12:38:26 fly kernel: drm_gem_handle_delete() at drm_gem_handle_delete+0xc8 0xffffffff84c58cb4
Jul 5 12:38:26 fly kernel: drm_gem_close_ioctl() at drm_gem_close_ioctl+0x23 0xffffffff84c58e9c

=== or ===
Jul 5 13:42:25 fly kernel: panic: Bad tailq NEXT(0xffffffe04cef0810->tqh_last) != NULL
Jul 5 13:42:25 fly kernel: cpuid = 1
Jul 5 13:42:25 fly kernel: Trace beginning at frame 0xffffffe10c36f428
Jul 5 13:42:25 fly kernel: panic() at4db928
Jul 5 13:42:25 fly kernel: sys_ioctl() at sys_ioctl+0x1c 0xffffffff804db9fe
Jul 5 13:42:25 fly kernel: syscall2() at syscall2+0x3fb 0xffffffff8078fc22
Jul 5 13:42:25 fly kernel: Xfast_syscall() at Xfast_syscall+0xcb 0xffffffff807788eb

=== or ===
Jul 5 15:45:06 fly kernel: panic: Bad link elm 0xffffffff82da1fd8 next->prev != elm
Jul 5 15:45:06 fly kernel: cpuid = 0
Jul 5 15:45:06 fly kernel: Trace beginning at frame 0xffffffe04ccf7980
Jul 5 15:45:06 fly kernel: panic() at panic+0x21f 0xffffffff804a12b0
Jul 5 15:45:06 fly kernel: panic() at panic+0x21f 0xffffffff804a12b0
Jul 5 15:45:06 fly kernel: softclock_handler() at softclock_handler+0x112 0xffffffff804c3cb1

=== or really short one ===
Jul 5 16:43:18 fly kernel: panic: Bad tailq NEXT(0xffffffe04ce98640->tqh_last) != NULL
Jul 5 16:43:18 fly kernel: cpuid = 1
Jul 5 16:43:18 fly kernel: Trace beginning at frame 0xffffffe10c374428
Jul 5 16:43:18 fly kernel: panic() at panic+0x21f 0xffffffff804a12b0

My processor is Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz (3400.02-MHz K8-class CPU). /boot/loader.conf contains vm.dma_reserved=256M.
RAM=8Gb.

#6 Updated by yellowrabbit2010 18 days ago

This is not really solution, just hack.
But with this patch computer can perform several "make -j 64 buildworld" in row, while playing video in mplayer and compiling vlc in Debian under qemu without crashes and reboots.

May be it helps somebody who fully understand locks, mutex & tokens :)

#7 Updated by ftigeot 17 days ago

  • Status changed from New to Resolved
  • % Done changed from 20 to 100

Thanks for the report and patch

A different fix has been committed (this FreeBSD code was replaced by Linux workqueues), ultimately fixing the locking issues.

Also available in: Atom PDF