AMD discrete GPU
hal0 supports AMD discrete GPUs through two paths: Vulkan (the default — already works in v1) and ROCm (opt-in via a separate toolbox image).
Where this fits
Section titled “Where this fits”AMD discrete is a supported target, not the reference platform. The Strix Halo page covers the iGPU / unified-memory story; this page is for desktop and workstation cards where you trade unified memory for raw VRAM bandwidth.
Tier-1 cards
Section titled “Tier-1 cards”- Radeon RX 7900 XTX (24 GB GDDR6)
- Radeon RX 7900 XT (20 GB GDDR6)
Both run Q4 7B–14B class models comfortably with room for a small embed slot. Q4 30B-A3B MoE works with shorter context. Q4 70B needs partial CPU offload.
Tier-2 cards
Section titled “Tier-2 cards”- Radeon RX 7800 XT (16 GB GDDR6)
- Radeon RX 7700 XT (12 GB GDDR6)
- Radeon PRO W7800 / W7900 (32 / 48 GB GDDR6 — workstation parts)
The 16 GB and below cards run one slot at a time as a rule — chat-only with a Q4 7B–13B, or a smaller chat model alongside an embed.
Vulkan path (default)
Section titled “Vulkan path (default)”The Vulkan toolbox is the same one Strix Halo uses. No ROCm headers
required, no kernel modules beyond amdgpu, no version-pinning
hell. The hardware probe picks Vulkan automatically when it sees an
AMD discrete GPU.
The trade is throughput: Vulkan llama.cpp on AMD discrete is solid but typically lags ROCm builds by a meaningful margin on chat throughput. For a daily-driver chat box it’s perfectly fine; for raw benchmarks you’ll want ROCm.
ROCm path (opt-in)
Section titled “ROCm path (opt-in)”When the ROCm toolbox lands, opting in is one slot config change:
hal0 slot swap primary --provider llama-cpp-rocmThe slot lifecycle handles the toolbox swap atomically — unloading → starting → warming → ready.
Recommended loadouts
Section titled “Recommended loadouts”These are repeated from the Strix Halo loadouts page with the discrete-card framing.
RX 7900 XTX / XT (20–24 GB VRAM)
Section titled “RX 7900 XTX / XT (20–24 GB VRAM)”primary:Qwen3-30B-A3B-Instruct-2507-Q4_K_M(~18.6 GB) fits with shorter context, orgemma-3-12b-it-Q4_K_M(~6.6 GB) for a longer window.embed: small Q4 embed only (nomic-embed-text-v2-moe~140 MB).- Q4 70B requires partial CPU offload — works, but drops well below VRAM-resident speeds.
RX 7800 XT / 7700 XT (12–16 GB VRAM)
Section titled “RX 7800 XT / 7700 XT (12–16 GB VRAM)”primary:Hermes-4-14B-Q4_K_M(~9 GB) orgemma-3-12b-it-Q4_K_M(~6.6 GB) with several GB for context.embed:nomic-embed-text-v2-moe-Q4_K_M(~140 MB) on the 16 GB variant; skip on 12 GB.
Workstation cards (W7800/W7900, 32–48 GB VRAM)
Section titled “Workstation cards (W7800/W7900, 32–48 GB VRAM)”- Closer to Strix Halo territory. Q4 70B fits cleanly, with room for embed. Concurrent slots become realistic.
Installation notes
Section titled “Installation notes”The standard one-liner from the install page handles everything for the Vulkan path:
curl -fsSL https://hal0.dev/install | bashYou’ll want:
- A recent Mesa with RADV Vulkan installed.
- The
amdgpukernel module loaded (dmesg | grep amdgpu). - The service user in the
rendergroup for/dev/dri/*access.
The hardware probe will detect the GPU’s VRAM correctly and size slot fit warnings to it.
Troubleshooting
Section titled “Troubleshooting”No GPU detected by probe. Run vulkaninfo --summary to confirm
the Vulkan runtime sees the card. If that’s empty, fix the host
Vulkan install before troubleshooting hal0.
Slot won’t start, journal mentions permission denied on /dev/dri.
Add the service user to the render group:
sudo usermod -aG render hal0sudo systemctl restart hal0-apiThroughput much lower than expected. Check that the card is not
power-limited (cat /sys/class/drm/card0/device/power_dpm_force_performance_level),
that the toolbox is the Vulkan build (hal0 slot list --json), and
that you’re not running with a larger context window than the model
documentation suggests.