CSGHub-Lite Introduction
CSGHub-Lite is a lightweight tool for running large language models locally, powered by models from the CSGHub platform.
Inspired by Ollama, csghub-lite provides model download, local inference, interactive chat, and an OpenAI-compatible REST API — all from a single binary.
Features
- One command to start —
csghub-lite rundownloads, loads, and chats - Model keep-alive — models stay loaded after exit (default 5 min), instant reconnect
- Auto-start server — background API server starts automatically, no manual setup
- Model download from CSGHub platform (hub.opencsg.com or private deployments)
- Local inference via llama.cpp (GGUF models, SafeTensors auto-converted)
- Interactive chat with streaming output
- REST API compatible with Ollama's API format
- Cross-platform — macOS, Linux, Windows
- Resume downloads — interrupted downloads resume where they left off
Model Formats
| Format | Download | Inference |
|---|---|---|
| GGUF | Yes | Yes (via llama.cpp) |
| SafeTensors | Yes | Yes (auto-converted to GGUF) |
SafeTensors checkpoints are converted once using the bundled llama.cpp convert_hf_to_gguf.py and system Python. Install these packages once:
pip3 install torch safetensors gguf transformers