Skip to content

hal0 documentation

hal0 turns a Linux box into a polished, OpenAI-compatible inference appliance. It is built for the AMD Ryzen AI Max+ 395 (“Strix Halo”) — iGPU + XDNA NPU + 128 GB of unified memory — and falls back to portable hardware parsers on every other host.

Every inference workload runs as its own podman container, supervised by a hal0-slot@<name>.service systemd unit. A single control plane, hal0-api on port 8080, owns the slot state machines, dispatches OpenAI-compatible /v1/* requests to the right slot, and serves the dashboard. There is no shared inference daemon to babysit.

  • OpenAI-compatible /v1/* API — chat, completions, embeddings, rerank, audio transcription, speech, and image generation. Point any OpenAI SDK at http://localhost:8080/v1 and go.
  • Hardware-aware slots — each capability gets its own container, image, flags, port, and lifecycle, sharing only the GPU through an arbiter.
  • First-run terminal setup — the hal0 setup TUI picks a hardware-anchored tier and provisions a coherent set of slots, models, and extensions.
  • Prewired chat UI — OpenWebUI on port 3001, zero config.
  • Signed self-update — cosign-verified tarballs with one-flag rollback.

hal0 dashboard overview The hal0 dashboard — slot status, hardware metrics, and quick actions.