Skip to main content

REST API Reference

The server listens on localhost:11435 by default.

Endpoints

MethodPathDescription
GET/api/healthHealth check
GET/api/tagsList local models
GET/api/psList running models
POST/api/showShow model details
POST/api/pullPull a model (streaming)
POST/api/stopStop/unload a running model
DELETE/api/deleteDelete a model
POST/api/generateText generation (streaming)
POST/api/chatChat completions (streaming)
POST/v1/chat/completionsOpenAI-compatible chat completions
GET/v1/modelsOpenAI-compatible model listing

Examples

Chat

curl http://localhost:11435/api/chat -d '{
"model": "Qwen/Qwen3-0.6B-GGUF",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Generate (non-streaming)

curl http://localhost:11435/api/generate -d '{
"model": "Qwen/Qwen3-0.6B-GGUF",
"prompt": "Write a haiku about programming",
"stream": false
}'

List running models

curl http://localhost:11435/api/ps

Stop a model

curl -X POST http://localhost:11435/api/stop -d '{"model": "Qwen/Qwen3-0.6B-GGUF"}'

OpenAI-compatible chat (Python)

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="unused")
response = client.chat.completions.create(
model="Qwen/Qwen3-0.6B-GGUF",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)