OpenAI-compatible API
hal0 exposes an OpenAI-compatible API at http://localhost:8080/v1.
Any client written against the OpenAI SDK — Python openai,
openai-node, LangChain, OpenWebUI, LiteLLM, Aider, Cursor with a
custom base URL — works against hal0 unmodified.
The API binds 0.0.0.0:8080 by default (override with HAL0_PORT).
Routes are implemented in src/hal0/api/routes/v1.py.
Endpoints
Section titled “Endpoints”| Method | Path | Purpose |
|---|---|---|
GET | /v1/models | List loaded models + slot aliases. |
GET | /v1/models/{model_id} | Detail for one model. |
POST | /v1/chat/completions | Chat with a model. Supports streaming. |
POST | /v1/completions | Plain completion (no chat template). |
POST | /v1/embeddings | Embed text into vectors. |
POST | /v1/rerankings | Rerank candidates against a query. |
POST | /v1/audio/transcriptions | Speech-to-text (Moonshine). |
POST | /v1/audio/speech | Text-to-speech (Kokoro). |
POST | /v1/images/generations | Image generation (ComfyUI on ROCm). |
curl: list models
Section titled “curl: list models”curl http://localhost:8080/v1/modelsThe response includes one entry per registry model plus one
entry per loaded slot name, so you can address the model directly
("qwen2.5-0.5b-instruct-q4_k_m") or by slot
("primary"). See Slot as model.
curl: chat completion
Section titled “curl: chat completion”curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "primary", "messages": [ {"role": "user", "content": "Hello!"} ] }'For streaming, add "stream": true — see Streaming.
curl: plain completion
Section titled “curl: plain completion”curl http://localhost:8080/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "primary", "prompt": "Once upon a time", "max_tokens": 64 }'curl: embeddings
Section titled “curl: embeddings”curl http://localhost:8080/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "embed", "input": ["hal0 runs locally", "OpenAI-compatible"] }'curl: rerank
Section titled “curl: rerank”curl http://localhost:8080/v1/rerankings \ -H "Content-Type: application/json" \ -d '{ "model": "embed", "query": "atomic config writes", "documents": [ "TOML config is written via NamedTemporaryFile + os.replace.", "Slots bind 127.0.0.1 in the 8081-8099 range." ] }'Rerank piggybacks on the embed slot because it uses the same
backend process.
curl: speech-to-text
Section titled “curl: speech-to-text”curl http://localhost:8080/v1/audio/transcriptions \ -H "Content-Type: multipart/form-data" \ -F file=@hello.wav \ -F model=sttSee Audio for the full Moonshine + Kokoro story.
curl: text-to-speech
Section titled “curl: text-to-speech”curl http://localhost:8080/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "model": "tts", "input": "Hello from hal0.", "voice": "af_bella" }' --output speech.wavcurl: image generation
Section titled “curl: image generation”curl http://localhost:8080/v1/images/generations \ -H "Content-Type: application/json" \ -d '{ "model": "sdxl-turbo", "prompt": "a cat in a hat, studio lighting", "size": "1024x1024", "response_format": "url" }'Curated models: sdxl-turbo (SAI Non-Commercial Research),
sd-1.5-pruned-emaonly (CreativeML Open RAIL-M), flux-schnell
(Apache-2.0). See Image generation
for the full request shape, response shape, slot configuration, and
hardware requirements.
Structured errors
Section titled “Structured errors”Every failure response carries a structured envelope:
{ "error": { "code": "slot.not_ready", "message": "primary is still warming", "details": { "slot": "primary", "state": "warming" } }}Codes are namespaced — slot.*, model.*, dispatch.*, config.*,
system.*. The dashboard surfaces them inline; the CLI prints them
with the same code so error reports between users and developers
stay anchored.
External upstreams
Section titled “External upstreams”The same /v1/* surface fronts external OpenAI-compatible providers
when configured — OpenRouter, Anthropic, OpenAI, custom endpoints.
You can mix local + remote per-model in one config; the dispatcher
picks the right backend based on the model field.