Skip to content

Manage slots

A slot is one inference endpoint: a podman container, backed by the hal0-slot@<name>.service systemd unit, serving a model on a loopback port. The hal0 slot CLI is a thin client over the API — every command hits the local daemon at http://localhost:8080, so the CLI and the dashboard always agree.

Terminal window
hal0 slot list

Shows every configured slot with its state, model, backend, port, and kind. The slot state machine moves through offline → pulling → starting → warming → ready → serving → idle (and unloading / error). Add --json to get the raw /api/slots payload for scripts:

Terminal window
hal0 slot list --json

Slot list showing configured slots with state badges, model names, device, and port The slot list view shows live state, assigned model, device, and port for every configured slot.

Terminal window
hal0 slot show chat

Prints the full status and the on-disk TOML config as JSON. Use --json for the unformatted {status, config} object.

Slot detail panel showing state, model, port, and on-disk TOML config as formatted JSON The slot detail panel combines live status with the persisted TOML config in a single view.

Terminal window
hal0 slot create embed \
--type embedding \
--model nomic-embed-text-v1.5-q8_0 \
--hardware vulkan \
--ctx-size 8192
  1. name (positional) — the slot id, e.g. primary, embed, stt.

  2. --type / -t — how the dispatcher routes requests. One of llm (default), embedding, reranking, transcription, tts, image.

  3. --model / -m (required) — the initial model ref to assign.

  4. --hardware — the compute backend: vulkan, rocm, or cpu. When omitted it is auto-detected from the hardware probe (falls back to vulkan). The slot’s device field is derived from this: vulkan → gpu-vulkan, rocm → gpu-rocm, cpu → cpu.

  5. --port / -p — the slot’s port. Omit it and hal0 auto-assigns the next free port in the 8081–8099 range.

  6. --ctx-size — context window in tokens (default 4096).

Creating a slot writes its config and systemd drop-in but does not start it — follow with hal0 slot load.

Terminal window
hal0 slot load primary
# or assign a model at load time:
hal0 slot load primary --model qwen3.6-27b

Loads the slot’s container and waits for it to go healthy. A bad --model is rejected against the registry up front, so you don’t wait out the health timeout on a typo.

Terminal window
hal0 slot swap primary --model qwen3.5-9b

A swap does two things by default:

  1. Hot-swap the running slot to the new model (POST /api/slots/{name}/swap).
  2. Persist the new default to the slot’s on-disk config so it survives a restart.

Pass --no-persist to try a model briefly without changing the default:

Terminal window
hal0 slot swap primary --model phi3-mini --no-persist

If the hot-swap succeeds but the persist step fails, the runtime swap is left in place and the failure is surfaced so you can retry.

Terminal window
hal0 slot edit primary --ctx-size 32768
hal0 slot edit embed --hardware rocm

Pass any subset of --model, --port, --ctx-size, --provider, --hardware. At least one is required. Changes that the container baked in (port, backend) take effect on the next restart.

Terminal window
hal0 slot delete scratch

Stops the unit and removes the config. You’ll be asked to confirm; pass --force / -f to skip the prompt.

Terminal window
# last 200 lines
hal0 slot logs primary
# follow live (SSE tail)
hal0 slot logs primary --follow
# more history
hal0 slot logs primary --lines 1000

Logs come from the slot’s hal0-slot@<name>.service journal. On a host without journalctl the command returns an empty result with a hint rather than an error.