AMD VEGA64在4.15以上的内核上崩溃

时间:2019-05-22 17:29:32

标签: linux-kernel vega

因此,在尝试运行内核4.19.39、5.0.13和5.1时,它们在启动Steam或Overwatch(BattleNet客户端)后冻结了几秒钟。当前正在运行4.15,并且运行良好且稳定。

我已经执行以下操作:

  • GRUB_CMDLINE_LINUX_DEFAULT="splash idle=nomwait"
  • typical电源选项
  • 更新了BIOS(从AGESA 1.0.0.4到1.0.0.6)
  • 更新的操作系统(Ubuntu 18.04)

硬件

AMD Ryzen 7 2700X Wraith Boxed
Asus Vega 64 Strix    
Gigabyte X470 AORUS ULTRA GAMING (AGESA 1.0.0.6)
G.Skill Ripjaws V 16GB DDR4 3200MHz (4 x 16GB)
Corsair CX850M 850W ATX power supply unit

屏幕获取-n

OS: Ubuntu 18.04 bionic
 Kernel: x86_64 Linux 4.15.0-48-generic
 Uptime: 1h 29m
 Packages: 3497
 Shell: bash 4.4.19
 Resolution: 3840x2160
 DE: GNOME 
 WM: GNOME Shell
 WM Theme: Adwaita
 GTK Theme: Ambiance [GTK2/3]
 Icon Theme: ubuntu-mono-dark
 Font: Ubuntu 11
 CPU: AMD Ryzen 7 2700X Eight-Core @ 16x 3.7GHz [36.3°C]
 GPU: Radeon RX Vega (VEGA10, DRM 3.23.0, 4.15.0-48-generic, LLVM 9.0.0)
 RAM: 6208MiB / 64432MiB

驱动程序和其他信息

~$ glxinfo | grep "OpenGL version"
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.2.0-devel - padoka PPA

~$ cat /etc/apt/sources.list.d/paulo-miguel-dias-ubuntu-mesa-bionic.list
deb http://ppa.launchpad.net/paulo-miguel-dias/mesa/ubuntu bionic main
# deb-src http://ppa.launchpad.net/paulo-miguel-dias/mesa/ubuntu bionic main

~$ sudo lspci -v | grep -i vga -A 10
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA controller])
    Subsystem: ASUSTeK Computer Inc. Vega 10 XT [Radeon RX Vega 64]
    Flags: bus master, fast devsel, latency 0, IRQ 114
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    Memory at f0000000 (64-bit, prefetchable) [size=2M]
    I/O ports at e000 [size=256]
    Memory at fcc00000 (32-bit, non-prefetchable) [size=512K]
    Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: [48] Vendor Specific Information: Len=08 <?>
    Capabilities: [50] Power Management version 3
    Capabilities: 

    ...

~$ apt show libdrm-amdgpu1 -a
Package: libdrm-amdgpu1
Version: 2.4.98+git1905192304.922d929~b~padoka0
Priority: optional
Section: libs
Source: libdrm
Maintainer: Debian X Strike Force <debian-x@lists.debian.org>
Installed-Size: 76,8 kB
Depends: libc6 (>= 2.17), libdrm2 (>= 2.4.82)
Download-Size: 26,9 kB
APT-Manual-Installed: yes
APT-Sources: http://ppa.launchpad.net/paulo-miguel-dias/mesa/ubuntu bionic/main amd64 Packages
Description: Userspace interface to amdgpu-specific kernel DRM services -- runtime
 This library implements the userspace interface to the kernel DRM
 services.  DRM stands for "Direct Rendering Manager", which is the
 kernelspace portion of the "Direct Rendering Infrastructure" (DRI).
 The DRI is currently used on Linux to provide hardware-accelerated

在使用Kernel 5.1进行测试时,我在内核日志中发现了以下内容

May 22 18:46:31 [HOST] kernel: [  256.354386] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354390] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354391] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0050153D
May 22 18:46:31 [HOST] kernel: [  256.354395] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354397] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354398] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354404] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354405] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354407] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354411] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354412] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354413] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354418] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354419] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354420] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354424] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354426] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354427] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354430] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354432] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354433] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354437] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354438] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354439] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354443] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354444] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354445] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354449] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354450] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354451] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:41 [HOST] kernel: [  261.469953] [drm:amdgpu_dm_commit_planes.isra.43 [amdgpu]] *ERROR* Waiting for fences timed out.
May 22 18:46:41 [HOST] kernel: [  266.593840] [drm:amdgpu_dm_commit_planes.isra.43 [amdgpu]] *ERROR* Waiting for fences timed out.
May 22 18:46:41 [HOST] kernel: [  266.599848] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=18098, emitted seq=18100
May 22 18:46:41 [HOST] kernel: [  266.599914] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575
May 22 18:46:41 [HOST] kernel: [  266.599918] amdgpu 0000:0c:00.0: GPU reset begin!
May 22 18:46:47 [HOST] kernel: [  271.709694] [drm:amdgpu_dm_commit_planes.isra.43 [amdgpu]] *ERROR* Waiting for fences timed out.
May 22 18:46:47 [HOST] kernel: [  272.165625] amdgpu 0000:0c:00.0: GPU BACO reset
May 22 18:46:47 [HOST] kernel: [  272.643907] amdgpu 0000:0c:00.0: GPU reset succeeded, trying to resume
May 22 18:46:47 [HOST] kernel: [  272.644035] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
May 22 18:46:47 [HOST] kernel: [  272.644126] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
May 22 18:46:47 [HOST] kernel: [  272.644277] [drm] PSP is resuming...
May 22 18:46:47 [HOST] kernel: [  272.790964] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
May 22 18:46:47 [HOST] kernel: [  272.801714] amdgpu: [powerplay] Failed to send message: 0x46, ret value: 0xffffffff
May 22 18:46:47 [HOST] kernel: [  272.801830] amdgpu: [powerplay] Failed to send message: 0x61, ret value: 0xffffffff
May 22 18:46:48 [HOST] kernel: [  273.172332] [drm] UVD and UVD ENC initialized successfully.
May 22 18:46:48 [HOST] kernel: [  273.271995] [drm] VCE initialized successfully.
May 22 18:46:48 [HOST] kernel: [  273.273190] [drm] recover vram bo from shadow start
May 22 18:46:48 [HOST] kernel: [  273.279784] [drm] recover vram bo from shadow done
May 22 18:46:48 [HOST] kernel: [  273.279787] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279789] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279823] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279831] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279833] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279838] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279844] amdgpu 0000:0c:00.0: GPU reset(2) succeeded!
May 22 18:46:48 [HOST] kernel: [  273.279844] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279848] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279853] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279855] [drm] Skip scheduling IBs!

1 个答案:

答案 0 :(得分:0)

5.5内核正在运行并且稳定!

uname -a

Linux patrick-X470-AORUS-ULTRA-GAMING 5.5.10-050510-generic #202003180732 SMP Wed Mar 18 07:35:23 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

屏幕抓取-n

 patrick@patrick-X470-AORUS-ULTRA-GAMING
 OS: Ubuntu 18.04 bionic
 Kernel: x86_64 Linux 5.5.10-050510-generic
 Uptime: 17h 38m
 Packages: 3877
 Shell: bash 4.4.20
 Resolution: 3840x2160
 DE: GNOME 
 WM: GNOME Shell
 WM Theme: Adwaita
 GTK Theme: Ambiance [GTK2/3]
 Icon Theme: ubuntu-mono-dark
 Font: Ubuntu 11
 CPU: AMD Ryzen 7 2700X Eight-Core @ 16x 3.7GHz [38.8°C]
 GPU: Radeon RX Vega (VEGA10, DRM 3.36.0, 5.5.10-050510-generic, LLVM 10.0.0)
 RAM: 10126MiB / 64332MiB

驱动程序和其他信息

$ glxinfo | grep "OpenGL version"
OpenGL version string: 4.6 (Compatibility Profile) Mesa 20.0.0-devel - padoka PPA

$ sudo lspci -v | grep -i vga -A 10
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA controller])
    Subsystem: ASUSTeK Computer Inc. Vega 10 XT [Radeon RX Vega 64]
    Flags: bus master, fast devsel, latency 0, IRQ 119
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    Memory at f0000000 (64-bit, prefetchable) [size=2M]
    I/O ports at e000 [size=256]
    Memory at fcc00000 (32-bit, non-prefetchable) [size=512K]
    Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: [48] Vendor Specific Information: Len=08 <?>
    Capabilities: [50] Power Management version 3
    Capabilities: [64] Express Legacy Endpoint, MSI 00

$ apt show libdrm-amdgpu1 -a
Package: libdrm-amdgpu1
Version: 2.4.100+git2001081023.9ebfac1~b~padoka0
Priority: optional
Section: libs
Source: libdrm
Maintainer: Debian X Strike Force <debian-x@lists.debian.org>
Installed-Size: 80,9 kB
Depends: libc6 (>= 2.17), libdrm2 (>= 2.4.100)
Download-Size: 28,2 kB
APT-Manual-Installed: yes
APT-Sources: http://ppa.launchpad.net/paulo-miguel-dias/mesa/ubuntu bionic/main amd64 Packages