CSGHub-Lite Introduction

CSGHub-Lite is a lightweight tool for running large language models locally, powered by models from the CSGHub platform.

Inspired by Ollama, csghub-lite provides model download, local inference, interactive chat, and an OpenAI-compatible REST API — all from a single binary.

Features

One command to start — csghub-lite run downloads, loads, and chats
Model keep-alive — models stay loaded after exit (default 5 min), instant reconnect
Auto-start server — background API server starts automatically, no manual setup
Model download from CSGHub platform (hub.opencsg.com or private deployments)
Local inference via llama.cpp (GGUF models, SafeTensors auto-converted)
Interactive chat with streaming output
REST API compatible with Ollama's API format
Cross-platform — macOS, Linux, Windows
Resume downloads — interrupted downloads resume where they left off

Model Formats

Format	Download	Inference
GGUF	Yes	Yes (via llama.cpp)
SafeTensors	Yes	Yes (auto-converted to GGUF)

SafeTensors checkpoints are converted once using the bundled llama.cpp convert_hf_to_gguf.py and system Python. Install these packages once:

pip3 install torch safetensors gguf transformers

Features​

Model Formats​

Features

Model Formats