Manage slots
A slot is one inference endpoint: a podman container, backed by the
hal0-slot@<name>.service systemd unit, serving a model on a loopback
port. The hal0 slot CLI is a thin client over the API — every command
hits the local daemon at http://localhost:8080, so the CLI and the
dashboard always agree.
List slots
Section titled “List slots”hal0 slot listShows every configured slot with its state, model, backend, port, and
kind. The slot state machine moves through offline → pulling → starting → warming → ready → serving → idle (and unloading / error). Add --json to get
the raw /api/slots payload for scripts:
hal0 slot list --json
The slot list view shows live state, assigned model, device, and port for every configured slot.
Inspect one slot
Section titled “Inspect one slot”hal0 slot show chatPrints the full status and the on-disk TOML config as JSON. Use
--json for the unformatted {status, config} object.
The slot detail panel combines live status with the persisted TOML config in a single view.
Create a slot
Section titled “Create a slot”hal0 slot create embed \ --type embedding \ --model nomic-embed-text-v1.5-q8_0 \ --hardware vulkan \ --ctx-size 8192-
name(positional) — the slot id, e.g.primary,embed,stt. -
--type/-t— how the dispatcher routes requests. One ofllm(default),embedding,reranking,transcription,tts,image. -
--model/-m(required) — the initial model ref to assign. -
--hardware— the compute backend:vulkan,rocm, orcpu. When omitted it is auto-detected from the hardware probe (falls back tovulkan). The slot’sdevicefield is derived from this:vulkan → gpu-vulkan,rocm → gpu-rocm,cpu → cpu. -
--port/-p— the slot’s port. Omit it and hal0 auto-assigns the next free port in the8081–8099range. -
--ctx-size— context window in tokens (default4096).
Creating a slot writes its config and systemd drop-in but does not
start it — follow with hal0 slot load.
Load, restart, unload
Section titled “Load, restart, unload”hal0 slot load primary# or assign a model at load time:hal0 slot load primary --model qwen3.6-27bLoads the slot’s container and waits for it to go healthy. A bad
--model is rejected against the registry up front, so you don’t wait
out the health timeout on a typo.
hal0 slot restart primaryUnloads then reloads — used after editing config that the running container baked in (ports, backend, ctx size).
hal0 slot unload primaryStops the container gracefully and returns the slot to offline.
Swap the model
Section titled “Swap the model”hal0 slot swap primary --model qwen3.5-9bA swap does two things by default:
- Hot-swap the running slot to the new model (
POST /api/slots/{name}/swap). - Persist the new default to the slot’s on-disk config so it survives a restart.
Pass --no-persist to try a model briefly without changing the default:
hal0 slot swap primary --model phi3-mini --no-persistIf the hot-swap succeeds but the persist step fails, the runtime swap is left in place and the failure is surfaced so you can retry.
Edit slot config
Section titled “Edit slot config”hal0 slot edit primary --ctx-size 32768hal0 slot edit embed --hardware rocmPass any subset of --model, --port, --ctx-size, --provider,
--hardware. At least one is required. Changes that the container baked
in (port, backend) take effect on the next restart.
Delete a slot
Section titled “Delete a slot”hal0 slot delete scratchStops the unit and removes the config. You’ll be asked to confirm; pass
--force / -f to skip the prompt.
Stream logs
Section titled “Stream logs”# last 200 lineshal0 slot logs primary
# follow live (SSE tail)hal0 slot logs primary --follow
# more historyhal0 slot logs primary --lines 1000Logs come from the slot’s hal0-slot@<name>.service journal. On a host
without journalctl the command returns an empty result with a hint
rather than an error.
Related
Section titled “Related”- Pull and register models — get a model into the registry before assigning it.
- Choose models — recommended picks per slot and how fit is evaluated.