Send your first chat

The installer brings up OpenWebUI alongside the API, already pointed at your primary slot. No config, no API key. You loaded a model in the FirstRun wizard; this page gets the first message out.

Open OpenWebUI

http://localhost:3001

If you set HAL0_OPENWEBUI_PORT, swap the port. OpenWebUI runs as a Docker container under hal0-openwebui.service. The installer wrote /etc/hal0/openwebui.env with OPENAI_API_BASE_URLS=http://127.0.0.1:8080/v1 — that’s the deep-link that makes “drop-in chat” work. See operate / OpenWebUI for day-2 details (swapping the bundled UI, rebinding ports, persistence).

On first launch, OpenWebUI asks you to create a local admin account. That account lives in OpenWebUI’s own SQLite database under /var/lib/hal0/openwebui/ — it has nothing to do with hal0’s auth story (which is, for v1, “use a firewall or a reverse proxy”).

Send a message

Pick the model. The model selector at the top of the chat pulls live from GET /v1/models. Anything assigned to a ready slot shows up here. With only the wizard’s pick loaded, your list has exactly one entry.
Type and hit send. The first message after a cold load takes a little longer — the slot is going through the warming → serving transition. Subsequent messages re-use the warm slot and arrive at the model’s natural tok/s.
Watch the slot. On the hal0 dashboard at http://localhost:8080, the Slots view streams the same state machine over SSE. You’ll see primary flip to serving while the response generates, then back to ready.

What’s actually happening

OpenWebUI sees a normal OpenAI-compatible backend. It POSTs to /v1/chat/completions. The hal0 API authenticates the request, picks the slot that owns the requested model, and proxies the stream back — all through the same dispatcher that handles every external client.

curl http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "phi-3-mini-4k-instruct-q4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenWebUI is doing exactly that under the hood. The bundled UI is convenience, not magic.

Adding more models

Open the Models page in the hal0 dashboard, paste a Hugging Face repo ID, and pick the slot you want it assigned to. The same pull job the wizard ran handles it — streamed progress, same lifecycle.

For multi-slot loadouts (chat + embed + voice all hot at once), primary + embed co-resident is the baseline. The Strix Halo loadouts section maps loadouts to hardware envelopes.

When OpenWebUI looks empty

If the model picker is blank, walk back through the chain:

Did the install finish? systemctl status hal0-openwebui — should be active (running).
Is the API up? curl http://localhost:8080/v1/models should return JSON. If it doesn’t, check journalctl -u hal0-api.
Does any slot own a model? hal0 slot list — primary should be ready with a model set. If not, re-run the FirstRun wizard.
Did you change HAL0_PORT? OpenWebUI’s env file still points at :8080 unless you updated it. Edit /etc/hal0/openwebui.env and systemctl restart hal0-openwebui.

Next steps

Slot architecture What `ready` and `serving` actually mean, and how the dispatcher routes.

API reference Point your own apps at `http://localhost:8080/v1`.

Strix Halo loadouts Curated chat + embed + voice combos that fit a 128 GB unified pool.