Skip to content

Pull and register models

hal0 keeps a local model registry — a record of every model it knows about, where its bytes live, and what it can do. You populate it two ways: pull from Hugging Face, or register a file you already have on disk.

Terminal window
hal0 model pull qwen3-4b

The argument is either a curated alias (see Choose models) or a model id already in the registry. The CLI starts a background pull on the daemon, then polls and renders a progress bar until the job finishes.

A pull is a background job with a small state machine:

queued → running → completed
↘ failed
↘ cancelled

The downloader streams the GGUF straight from https://huggingface.co/<repo>/resolve/main/<file> into a tempfile, computing SHA-256 as it goes, then atomically moves it into place and upserts the registry entry with the size and integrity digest. Because the temp staging dir lives on the same filesystem as the final path, the install is a true atomic rename even for multi-GB files.

The CLI handles progress for you, but the underlying API is available directly:

Terminal window
# kick off the pull
curl -X POST http://localhost:8080/api/models/qwen3-4b/pull
# poll status
curl http://localhost:8080/api/models/qwen3-4b/pull/status
# live SSE progress
curl -N http://localhost:8080/api/models/qwen3-4b/pull/stream
# cancel an in-flight pull
curl -X POST http://localhost:8080/api/models/qwen3-4b/pull/cancel

Pulls are de-duplicated: POSTing a pull for a model already queued or running returns the existing job handle instead of starting a second download.

To pull a file that isn’t in the curated catalogue, supply the repo and filename in the body. hal0 seeds a registry row for the new id so the dashboard can track progress, then pulls:

Terminal window
curl -X POST http://localhost:8080/api/models/my-qwen-build/pull \
-H 'content-type: application/json' \
-d '{"hf_repo":"unsloth/Qwen3.6-27B-GGUF","hf_filename":"Qwen3.6-27B-UD-Q5_K_XL.gguf","labels":["chat"]}'

hal0 proxies HF’s public search and a per-repo inspector so you can find a model and see its pullable variants without leaving your host.

Terminal window
# free-text search (capped at 20 rows)
curl 'http://localhost:8080/api/hf/search?q=qwen3+gguf&type=text-generation'
# inspect a repo: list its GGUF variants + license + README excerpt
curl -X POST http://localhost:8080/api/models/inspect \
-H 'content-type: application/json' \
-d '{"hf_repo":"unsloth/Qwen3-8B-GGUF"}'

Inspect returns each .gguf file as a variant with its real LFS byte size; use a variant’s id (the filename) as hf_filename when you pull.

If the bytes are already on the host — say you dropped a GGUF into the model store — register it without re-downloading:

Terminal window
hal0 model register qwen3-4b-q4_k_m \
--path /path/to/qwen3-4b-instruct-q4_k_m.gguf \
--name "Qwen3 4B Q4_K_M" \
--license Apache-2.0

The file must be readable by the hal0-api process; hal0 records metadata, it does not copy or chown. The API also exposes a one-shot “add by path” flow (POST /api/models/add-from-path) and a directory scan (POST /api/models/scan) that auto-detects capabilities and backends from each file.

Terminal window
hal0 model list
hal0 model list --json

Aggregates the local registry with everything advertised by configured upstreams. Local files win on id collision and are flagged installed.

Models registry — installed and available models with pull status badges The Models tab lists every installed and upstream-advertised model, with pull-state badges for in-flight downloads.

Terminal window
hal0 model assign qwen3.5-9b --slot chat

This sets the slot’s default model but does not load it — follow with hal0 slot load chat (see Manage slots).

Registry entries are metadata records — id, display name, path, size, SHA-256, capabilities, and the Hugging Face coordinates a pull came from. The registry is the source of truth the dispatcher resolves model ids against; the actual weights live in your configured model store.