Slot as model

In a standard OpenAI request the model field names a specific model. hal0 extends this: the model field may instead name a slot (by its name/alias) or a virtual role name. hal0 rewrites it to the backing model id and ensures the right slot is loaded — all before the dispatcher routes the request.

This lets a client pin a co-resident slot (e.g. always talk to the chat slot) without knowing or hard-coding which model it currently serves. The alias is stable across model swaps.

Addressing modes

`model` value	Resolution
A chat-slot alias (`chat`, `agent`, `utility`; legacy `primary`, `agent-hermes`)	Rewritten to the slot’s configured model id.
A virtual `hal0/*` role name	Live-resolved to whichever model the role currently maps to.
A raw registry model id	Used directly; routed to whichever upstream serves it.

How the rewrite works

The chat route applies two steps before dispatch, both in routes/v1.py:

1. Alias rewrite — `_rewrite_chat_slot_alias`

The request’s model is checked against the chat-slot alias map (slot name → configured model id). On a match, hal0:

replaces body["model"] with the slot’s model id, and
rewrites the request’s cached body bytes, so any downstream consumer that re-reads the raw request forwards the rewritten model name — not the bare alias.

It is a no-op when the value isn’t a known alias, already equals its own model id, the slot manager is absent, or the config read fails (best-effort — it never blocks the request).

2. Backend-aware load — `_ensure_backend_for_model`

After the alias rewrite, hal0 finds the chat slot that owns the resolved model id and calls SlotManager.load(slot) before dispatch. This is idempotent and blocks until the slot is ready, so the model is already loaded under the slot’s declared device / profile whichever routing path dispatch then takes. A model with no backing chat slot is left alone and dispatches as-is.

Slot name vs registry ref

Slot name / alias — addresses the slot and follows whatever model the slot is configured to serve. Best for clients that want a stable handle.
Registry model id — addresses the model. The dispatcher routes it to a serving upstream; if no upstream serves it, the request surfaces the dispatcher’s NoRouteFound envelope (404). There is no catch-all fall-through.

Interaction with the GPU arbiter

When the GPU is in exclusive image mode, the backend-aware load step refuses to lazy-load an LLM back onto the GPU — it raises the structured gpu.image_mode 503 (with a Retry-After) rather than fighting the arbiter and resurrecting a slot the arbiter just drained.

OpenAI-compatible API — the /v1/* endpoints.
Slot lifecycle — what “loaded under its backend” means.
hal0 CLI — hal0 slot to create and configure slots.