Skip to main content

CSGHub-Lite Introduction

CSGHub-Lite is a lightweight tool for running large language models locally, powered by models from the CSGHub platform.

Inspired by Ollama, csghub-lite provides model download, local inference, interactive chat, and an OpenAI-compatible REST API — all from a single binary.

Features

  • One command to startcsghub-lite run downloads, loads, and chats
  • Model keep-alive — models stay loaded after exit (default 5 min), instant reconnect
  • Auto-start server — background API server starts automatically, no manual setup
  • Model download from CSGHub platform (hub.opencsg.com or private deployments)
  • Local inference via llama.cpp (GGUF models, SafeTensors auto-converted)
  • Interactive chat with streaming output
  • REST API compatible with Ollama's API format
  • Cross-platform — macOS, Linux, Windows
  • Resume downloads — interrupted downloads resume where they left off

Model Formats

FormatDownloadInference
GGUFYesYes (via llama.cpp)
SafeTensorsYesYes (auto-converted to GGUF)

SafeTensors checkpoints are converted once using the bundled llama.cpp convert_hf_to_gguf.py and system Python. Install these packages once:

pip3 install torch safetensors gguf transformers