REST API Reference
The server listens on localhost:11435 by default.
Endpoints
| Method | Path | Description |
|---|---|---|
GET | /api/health | Health check |
GET | /api/tags | List local models |
GET | /api/ps | List running models |
POST | /api/show | Show model details |
POST | /api/pull | Pull a model (streaming) |
POST | /api/stop | Stop/unload a running model |
DELETE | /api/delete | Delete a model |
POST | /api/generate | Text generation (streaming) |
POST | /api/chat | Chat completions (streaming) |
POST | /v1/chat/completions | OpenAI-compatible chat completions |
GET | /v1/models | OpenAI-compatible model listing |
Examples
Chat
curl http://localhost:11435/api/chat -d '{
"model": "Qwen/Qwen3-0.6B-GGUF",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Generate (non-streaming)
curl http://localhost:11435/api/generate -d '{
"model": "Qwen/Qwen3-0.6B-GGUF",
"prompt": "Write a haiku about programming",
"stream": false
}'
List running models
curl http://localhost:11435/api/ps
Stop a model
curl -X POST http://localhost:11435/api/stop -d '{"model": "Qwen/Qwen3-0.6B-GGUF"}'
OpenAI-compatible chat (Python)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="unused")
response = client.chat.completions.create(
model="Qwen/Qwen3-0.6B-GGUF",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)