Custom slots
The four built-in slots cover the OpenAI-compatible surface. You can define additional slots — for a second chat model, a vision model, the FLM provider on AMD XDNA, ComfyUI, or anything else a provider supports.
When you want a custom slot
Section titled “When you want a custom slot”- Keep two chat models hot at once. A small fast model for
autocomplete, a big slow model for reasoning. Different slot names,
same
/v1/chat/completionsendpoint, addressed by model name. - Use the NPU alongside the iGPU. A custom
npuslot binds the FLM provider to XDNA so the iGPU stays free forprimarychat. - Dedicated vision slot. A multimodal model lives in its own slot so swapping the chat primary doesn’t unload it.
Defining one (dashboard)
Section titled “Defining one (dashboard)”The dashboard’s Slots view has an Add slot form. It takes:
- A name (must match
^[a-z][a-z0-9_-]*$). - A provider (
llama.cpp,flm,moonshine,kokoro). - A model registry ref.
- An idle-timeout (after which the slot transitions
ready → idleand becomes an unload candidate).
The form writes the slot definition to the TOML config atomically (see Config), then stages the systemd unit. You start the slot with the Load button.
Defining one (CLI / config)
Section titled “Defining one (CLI / config)”Slots are entries in hal0.toml under [slots.<name>]. The shape
mirrors the dashboard form one-for-one. After editing:
hal0 config validate # schema checkhal0 config show # confirm the merged viewhal0 slot load my-slot # bring it upSee Config schema for the field list.
Limits
Section titled “Limits”- Slot names must be unique. The four built-in names
(
primary,embed,stt,tts) are reserved. - Total slot count is capped by the port range (
8081–8099, so 19 concurrent slots). - A slot can only host one model at a time. To run two models on the same provider, define two slots.
Coming soon — outline
Section titled “Coming soon — outline”The full custom-slot authoring guide will cover:
- Choosing a provider for the workload you have in mind.
- Sizing notes per provider — what fits in VRAM vs unified memory vs what needs paging.
- Per-slot env overrides and provider-specific knobs.
- Defining a slot that fronts an external upstream (OpenRouter, Anthropic, custom OpenAI-compatible endpoint).
Until then, copy the shape of one of the built-ins in hal0.toml and
adjust the model + name fields.