REST API Reference

The server listens on localhost:11435 by default.

Endpoints

Method	Path	Description
`GET`	`/api/health`	Health check
`GET`	`/api/tags`	List local models
`GET`	`/api/ps`	List running models
`POST`	`/api/show`	Show model details
`POST`	`/api/pull`	Pull a model (streaming)
`POST`	`/api/stop`	Stop/unload a running model
`DELETE`	`/api/delete`	Delete a model
`POST`	`/api/generate`	Text generation (streaming)
`POST`	`/api/chat`	Chat completions (streaming)
`POST`	`/v1/chat/completions`	OpenAI-compatible chat completions
`GET`	`/v1/models`	OpenAI-compatible model listing

Examples

Chat

curl http://localhost:11435/api/chat -d '{
  "model": "Qwen/Qwen3-0.6B-GGUF",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

Generate (non-streaming)

curl http://localhost:11435/api/generate -d '{
  "model": "Qwen/Qwen3-0.6B-GGUF",
  "prompt": "Write a haiku about programming",
  "stream": false
}'

List running models

curl http://localhost:11435/api/ps

Stop a model

curl -X POST http://localhost:11435/api/stop -d '{"model": "Qwen/Qwen3-0.6B-GGUF"}'

OpenAI-compatible chat (Python)

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="unused")
response = client.chat.completions.create(
    model="Qwen/Qwen3-0.6B-GGUF",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Endpoints​

Examples​

Chat​

Generate (non-streaming)​

List running models​

Stop a model​

OpenAI-compatible chat (Python)​

Endpoints

Examples

Chat

Generate (non-streaming)

List running models

Stop a model

OpenAI-compatible chat (Python)