Choose models
You don’t have to guess which model to run. hal0 ships a curated catalogue — a hand-picked list of good defaults, each pointing at a specific GGUF (or safetensors) file in a Hugging Face repo, sized for the target’s unified-memory pool. Pull any of them by id:
hal0 model pull qwen3.6-27bThe catalogue ships with each release (it lives in code, not a remote manifest), so you can never land on a stale pick list.
The Models view renders each curated entry as a card with size, license, and minimum-memory badges.
Chat picks
Section titled “Chat picks”These are surfaced first by the FirstRun wizard, ordered roughly largest to smallest. The headline picks are quantized to UD-Q4_K_XL / Q5_K_XL for the quality margin a large unified pool affords.
| Id | Model | Size | Min memory | License | Notes |
|---|---|---|---|---|---|
qwen3-coder-next | Qwen3 Coder Next | ~49 GB | ~56 GB | Apache-2.0 | Frontier coding model; needs the full pool. |
qwen3.6-27b | Qwen3.6 27B (MTP ROCmFP4) | ~17 GB | ~22 GB | other (see HF repo) | General-purpose default with MTP and vision support; ROCmFP4 build for Strix Halo. |
gpt-oss-20b | GPT-OSS 20B | ~12 GB | ~16 GB | Apache-2.0 | OpenAI open-weights 20B. |
qwen3.5-9b | Qwen3.5 9B | ~6 GB | ~8 GB | Apache-2.0 | Lean default; fits alongside embed + voice slots. |
qwen3.5-0.8b | Qwen3.5 0.8B | ~0.6 GB | ~1 GB | Apache-2.0 | Tiny; sub-second cold start. Doubles as the install smoke test. |
qwen3-4b | Qwen3 4B Instruct | ~2.5 GB | ~4 GB | Apache-2.0 | Fast all-rounder for a 4–8 GB budget. |
llama32-3b | Llama 3.2 3B Instruct | ~2 GB | ~3 GB | Llama-3.2-Community | Small and fast; good for low-memory hosts. |
phi3-mini | Phi-3 Mini 4K Instruct | ~2.4 GB | ~3 GB | MIT | Compact reasoning, MIT-licensed. |
Embed and rerank picks
Section titled “Embed and rerank picks”llama.cpp-compatible GGUFs that fan out to gpu-vulkan / gpu-rocm /
cpu. Assign these to an embedding or reranking slot.
| Id | Model | Capability | Size | License |
|---|---|---|---|---|
nomic-embed-text-v1.5-q8_0 | Nomic Embed Text v1.5 (Q8_0) | embed | ~0.15 GB | Apache-2.0 |
bge-base-en-v1.5-q4_k_m | BGE Base EN v1.5 (Q4_K_M) | embed | ~0.07 GB | MIT |
bge-reranker-base-q4_k_m | BGE Reranker Base (Q4_K_M) | rerank | ~0.26 GB | MIT |
bge-reranker-v2-m3-q4_k_m | BGE Reranker v2 M3 (Q4_K_M) | rerank | ~0.44 GB | Apache-2.0 |
Image picks
Section titled “Image picks”Routed through ComfyUI — these land in ComfyUI’s models/checkpoints tree
rather than the per-id model directory, so its loaders pick them up
directly.
| Id | Model | Size | License |
|---|---|---|---|
sdxl-turbo | SDXL Turbo | ~6.5 GB | SAI-NC-Research-Community (research only) |
sdxl-lightning | SDXL Lightning (8-step) | ~6.8 GB | CreativeML-OpenRAIL++-M |
sd-1.5-pruned-emaonly | Stable Diffusion 1.5 | ~4.3 GB | CreativeML-Open-RAIL-M |
How model fit is evaluated
Section titled “How model fit is evaluated”Before a model is assigned to a slot, hal0 computes a fit verdict —
allowed, degraded, or blocked — with stable reason strings the
dashboard renders as chips. The check runs three gates in order:
-
Type match. The model’s classified modality must match the slot type. A chat model maps to an
llmslot, embed →embedding, rerank →reranking, stt →transcription, tts →tts, image →image. A mismatch isblocked(model.slot_type_mismatch). -
Resolvability. The model id must resolve — present in the local registry. An unresolvable id is
blocked(model.not_resolvable). -
Profile + device class. The slot’s profile must support the slot type, and the device must agree with the profile’s device class (
gpu-rocm/gpu-vulkan→gpu,cpu→cpu,npu→npu). NPU and image mismatches are hardblockedfailures — they route to different runtime families. A GPU↔CPU mismatch isdegradedinstead: a custom image may still run, but it needs operator attention before launch.
A degraded verdict is allowed to proceed; only blocked stops the
assignment. This is why the device you pick for a slot has to be
consistent with its profile — see
Manage slots for the --hardware flag that
sets it.
Off-catalogue models
Section titled “Off-catalogue models”Anything not in the catalogue still works — search Hugging Face, inspect a repo’s variants, and pull by coordinates. See Pull and register models.