Manage slots

A slot is one inference endpoint: a podman container, backed by the hal0-slot@<name>.service systemd unit, serving a model on a loopback port. The hal0 slot CLI is a thin client over the API — every command hits the local daemon at http://localhost:8080, so the CLI and the dashboard always agree.

List slots

hal0 slot list

Shows every configured slot with its state, model, backend, port, and kind. The slot state machine moves through offline → pulling → starting → warming → ready → serving → idle (and unloading / error). Add --json to get the raw /api/slots payload for scripts:

hal0 slot list --json

Slot list showing configured slots with state badges, model names, device, and port The slot list view shows live state, assigned model, device, and port for every configured slot.

Inspect one slot

hal0 slot show chat

Prints the full status and the on-disk TOML config as JSON. Use --json for the unformatted {status, config} object.

Slot detail panel showing state, model, port, and on-disk TOML config as formatted JSON The slot detail panel combines live status with the persisted TOML config in a single view.

Create a slot

hal0 slot create embed \
  --type embedding \
  --model nomic-embed-text-v1.5-q8_0 \
  --hardware vulkan \
  --ctx-size 8192

name (positional) — the slot id, e.g. primary, embed, stt.
--type / -t — how the dispatcher routes requests. One of llm (default), embedding, reranking, transcription, tts, image.
--model / -m (required) — the initial model ref to assign.
--hardware — the compute backend: vulkan, rocm, or cpu. When omitted it is auto-detected from the hardware probe (falls back to vulkan). The slot’s device field is derived from this: vulkan → gpu-vulkan, rocm → gpu-rocm, cpu → cpu.
--port / -p — the slot’s port. Omit it and hal0 auto-assigns the next free port in the 8081–8099 range.
--ctx-size — context window in tokens (default 4096).

Creating a slot writes its config and systemd drop-in but does not start it — follow with hal0 slot load.

Load, restart, unload

hal0 slot load primary
# or assign a model at load time:
hal0 slot load primary --model qwen3.6-27b

Loads the slot’s container and waits for it to go healthy. A bad --model is rejected against the registry up front, so you don’t wait out the health timeout on a typo.

hal0 slot restart primary

Unloads then reloads — used after editing config that the running container baked in (ports, backend, ctx size).

hal0 slot unload primary

Stops the container gracefully and returns the slot to offline.

Swap the model

hal0 slot swap primary --model qwen3.5-9b

A swap does two things by default:

Hot-swap the running slot to the new model (POST /api/slots/{name}/swap).
Persist the new default to the slot’s on-disk config so it survives a restart.

Pass --no-persist to try a model briefly without changing the default:

hal0 slot swap primary --model phi3-mini --no-persist

If the hot-swap succeeds but the persist step fails, the runtime swap is left in place and the failure is surfaced so you can retry.

Edit slot config

hal0 slot edit primary --ctx-size 32768
hal0 slot edit embed --hardware rocm

Pass any subset of --model, --port, --ctx-size, --provider, --hardware. At least one is required. Changes that the container baked in (port, backend) take effect on the next restart.

Delete a slot

hal0 slot delete scratch

Stops the unit and removes the config. You’ll be asked to confirm; pass --force / -f to skip the prompt.

Stream logs

# last 200 lines
hal0 slot logs primary

# follow live (SSE tail)
hal0 slot logs primary --follow

# more history
hal0 slot logs primary --lines 1000

Logs come from the slot’s hal0-slot@<name>.service journal. On a host without journalctl the command returns an empty result with a hint rather than an error.

Pull and register models — get a model into the registry before assigning it.
Choose models — recommended picks per slot and how fit is evaluated.