Skip to content

Choose models

You don’t have to guess which model to run. hal0 ships a curated catalogue — a hand-picked list of good defaults, each pointing at a specific GGUF (or safetensors) file in a Hugging Face repo, sized for the target’s unified-memory pool. Pull any of them by id:

Terminal window
hal0 model pull qwen3.6-27b

The catalogue ships with each release (it lives in code, not a remote manifest), so you can never land on a stale pick list.

Models registry view showing the curated catalogue cards with size, license, and memory badges The Models view renders each curated entry as a card with size, license, and minimum-memory badges.

These are surfaced first by the FirstRun wizard, ordered roughly largest to smallest. The headline picks are quantized to UD-Q4_K_XL / Q5_K_XL for the quality margin a large unified pool affords.

IdModelSizeMin memoryLicenseNotes
qwen3-coder-nextQwen3 Coder Next~49 GB~56 GBApache-2.0Frontier coding model; needs the full pool.
qwen3.6-27bQwen3.6 27B (MTP ROCmFP4)~17 GB~22 GBother (see HF repo)General-purpose default with MTP and vision support; ROCmFP4 build for Strix Halo.
gpt-oss-20bGPT-OSS 20B~12 GB~16 GBApache-2.0OpenAI open-weights 20B.
qwen3.5-9bQwen3.5 9B~6 GB~8 GBApache-2.0Lean default; fits alongside embed + voice slots.
qwen3.5-0.8bQwen3.5 0.8B~0.6 GB~1 GBApache-2.0Tiny; sub-second cold start. Doubles as the install smoke test.
qwen3-4bQwen3 4B Instruct~2.5 GB~4 GBApache-2.0Fast all-rounder for a 4–8 GB budget.
llama32-3bLlama 3.2 3B Instruct~2 GB~3 GBLlama-3.2-CommunitySmall and fast; good for low-memory hosts.
phi3-miniPhi-3 Mini 4K Instruct~2.4 GB~3 GBMITCompact reasoning, MIT-licensed.

llama.cpp-compatible GGUFs that fan out to gpu-vulkan / gpu-rocm / cpu. Assign these to an embedding or reranking slot.

IdModelCapabilitySizeLicense
nomic-embed-text-v1.5-q8_0Nomic Embed Text v1.5 (Q8_0)embed~0.15 GBApache-2.0
bge-base-en-v1.5-q4_k_mBGE Base EN v1.5 (Q4_K_M)embed~0.07 GBMIT
bge-reranker-base-q4_k_mBGE Reranker Base (Q4_K_M)rerank~0.26 GBMIT
bge-reranker-v2-m3-q4_k_mBGE Reranker v2 M3 (Q4_K_M)rerank~0.44 GBApache-2.0

Routed through ComfyUI — these land in ComfyUI’s models/checkpoints tree rather than the per-id model directory, so its loaders pick them up directly.

IdModelSizeLicense
sdxl-turboSDXL Turbo~6.5 GBSAI-NC-Research-Community (research only)
sdxl-lightningSDXL Lightning (8-step)~6.8 GBCreativeML-OpenRAIL++-M
sd-1.5-pruned-emaonlyStable Diffusion 1.5~4.3 GBCreativeML-Open-RAIL-M

Before a model is assigned to a slot, hal0 computes a fit verdictallowed, degraded, or blocked — with stable reason strings the dashboard renders as chips. The check runs three gates in order:

  1. Type match. The model’s classified modality must match the slot type. A chat model maps to an llm slot, embed → embedding, rerank → reranking, stt → transcription, tts → tts, image → image. A mismatch is blocked (model.slot_type_mismatch).

  2. Resolvability. The model id must resolve — present in the local registry. An unresolvable id is blocked (model.not_resolvable).

  3. Profile + device class. The slot’s profile must support the slot type, and the device must agree with the profile’s device class (gpu-rocm/gpu-vulkangpu, cpucpu, npunpu). NPU and image mismatches are hard blocked failures — they route to different runtime families. A GPU↔CPU mismatch is degraded instead: a custom image may still run, but it needs operator attention before launch.

A degraded verdict is allowed to proceed; only blocked stops the assignment. This is why the device you pick for a slot has to be consistent with its profile — see Manage slots for the --hardware flag that sets it.

Anything not in the catalogue still works — search Hugging Face, inspect a repo’s variants, and pull by coordinates. See Pull and register models.