I'm the owner of an AMD RX580 GPU. On Linux, it's using the open-source amdgpu
driver
and has been pretty nice for the occasional gaming session and playing
with Blender, even on my aging desktop (originally built in 2014 starring
an Intel i5-4570 CPU!). One thing my poor desktop is showing its age for is
image processing. I sometimes edit raw pictures in the wonderful darktable,
and some modules are very heavy on the CPU. The good news is darktable comes
with support for OpenCL. The bad news is although my GPU supports OpenCL,
it's quite complicated to get it working on Linux. AMD gave up on providing
open-source OpenCL support for this generation of GPU (and older). Instead,
users have to rely on the legacy, proprietary drivers provided by AMD1.
Since it's possible to only install the OpenCL stack with the amdgpu-install
tool, I figured I could give it a try.
Unfortunately for me, I timed it very badly: Ubuntu 22.04.3 had been released the day before, bringing a new version of the Linux kernel (6.2) and so my computer's kernel had been upgraded accordingly. The latest version of the AMD proprietary drivers had been released on July 31st and only supported Ubuntu 22.04.2… So, of course, when I tried to install it, it miserably failed when trying to install its DKMS.
I used the amdgpu-uninstall
script to remove the packages installed by the
AMD proprietary drivers, and called it a day. I could always revisit this when
AMD would release a version of their drivers compatible with Ubuntu 22.04.3.
But the next day…
The next day, when I booted my desktop, I was back in 1998. The login screen was using a 1024×768 resolution. Same issue once logged in, and there were no other available resolutions in sight in GNOME Settings.
I checked the content of the system journal for the last boot (journalctl
-b0
), and saw something strange:
(EE) open /dev/dri/card0: No such file or directory
(EE)
stands for an error in X.org. /dev/dri/card0
is the device normally
created by the amdgpu
driver… why wasn't it there?!
On Ubuntu, there is another interesting log file:
/var/log/gpu-manager.log
. It lists a bunch of things in a human-friendly
format. A few strange things appeared:
Is amdgpu loaded? no
Is amdgpu blacklisted? yes
Is amdgpu versioned? no
Is amdgpu pro stack? no
(...)
Error: can't access /sys/bus/pci/devices/0000:01:00.0/driver
The device is not bound to any driver.
Error : Failed to open /dev/dri
radeon
is the open-source driver used for pretty old AMD GPUs. "Recent"
AMD GPUs (as in, less than 10 years old) are all compatible with the amdgpu
driver. So why was amdgpu
not loaded, and even blacklisted?!
Side note: I find the whole AMD naming circus extremely confusing. RX580, Polaris, Vega, GCN, RDNA… It's so confusing than Wikipedia has a whole article to track each generation of GPUs and all the codenames attached to them.
After poking around and chatting around, my colleague Daniel suggested to
have a look at the list of blacklisted modules in /etc/modprobe.d/
. And,
sure enough, there was a blacklist-amdgpu.conf
containing the dreaded
blacklist amdgpu
!
I deleted this file, rebooted, and my system booted into a GNOME session in full 4k glory!
Apprently, the cause of all this is a bug in the amdgpu-uninstall
script…
Now, /var/log/gpu-manager.log
is happier:
Is amdgpu loaded? yes
Is amdgpu blacklisted? no
(...)
Found "/dev/dri/card0", driven by "amdgpu"
output 0:
card0-DP-2
Number of connected outputs for /dev/dri/card0: 1
So here we go. When facing this kind of weird issues on Linux, the first step is almost always to have a look at the system journal. And the second step is either to have a great colleague, either to have good search-fu skills! 😆
Edit: I opened an issue in their public bugtracker (#2800).