Hardware matrix

hal0 targets AMD APU/GPU plus the AMD XDNA NPU, with a CPU fallback that is always available. This page lists what runs on each accelerator and the order hal0 prefers them in.

Hardware targets

Target	Backend / runtime	What runs there
AMD GPU (ROCm)	llama.cpp ROCm	chat / embed / rerank LLM inference (highest tps)
AMD GPU (Vulkan)	llama.cpp Vulkan	chat / embed / rerank LLM inference (broad-compat)
AMD XDNA NPU	FastFlowLM (FLM)	chat + speech-to-text + embeddings (one process)
CPU	llama.cpp / ONNX	fallback LLM inference; Kokoro TTS

The reference platform is AMD Strix Halo: an 8060S-class iGPU, an XDNA NPU, and a unified-memory pool the GPU shares with the host via GTT. hal0’s profile flags are bench-tuned for that target, but the GPU/Vulkan and CPU paths run on any modern Linux AMD system.

Hardware metrics panel showing GPU, NPU, and memory utilization Hardware monitoring dashboard for GPU, NPU, and system memory.

Backend availability order

hal0 probes the host and advertises the backends it can actually run, in this order:

NPU — only when an XDNA NPU is present and the FLM toolbox image is already pulled locally.
GPU (Vulkan) — whenever a GPU is detected (every modern Linux GPU has Mesa Vulkan).
GPU (ROCm) — only when the detected GPU is an AMD GPU with compute support (compute_capable).
CPU — always reachable; the guaranteed fallback.

Backend id	Label	Provider	Multiplex	Advertised when
`npu`	NPU	`flm`	yes	XDNA present + FLM image pulled
`gpu-vulkan`	GPU (Vulkan)	`llama-server`	no	any GPU detected
`gpu-rocm`	GPU (ROCm)	`llama-server`	no	AMD GPU with ROCm compute support
`cpu`	CPU	`llama-server`	no	always

The NPU is multiplex: a single flm serve process answers chat, STT, and embeddings at once, so one NPU slot can back three capabilities.

The hal0 Slots view showing iGPU and FLM (NPU) slots in the Inference Engine, above the shared GTT memory carve-out The Inference Engine slot inventory — here 4 iGPU · 3 FLM — with the shared iGPU GTT carve-out above it.

NPU host requirements

The NPU path runs FastFlowLM in a container, but the host must provide the XDNA driver and firmware:

the amdxdna kernel driver (in-tree on kernel ≥ 6.14, or via the amdxdna-dkms package on kernel ≥ 6.10), and
NPU firmware ≥ 1.1.0.0.