🚀 Exciting Updates Coming Soon!
New UI, Function Calling, and more amazing features are on the way! Stay tuned for updates.
Run GGUF models on your computer with a chat ui.
Your own AI assistant runs locally on your computer.
Inspired by Node-Llama-Cpp, Llama.cpp
Make sure you have Node.js (download current) installed.
npm install -g catai catai install qwen3-4b-q4_k_m catai up- Auto detect programming language 🧑💻
- Click on user icon to show original message 💬
- Real time text streaming ⏱️
- Fast model downloads 🚀
Usage: catai [options] [command] Options: -V, --version output the version number -h, --help display help for command Commands: install|i [options] [models...] Install any GGUF model models|ls [options] List all available models use [model] Set model to use serve|up [options] Open the chat website update Update server to the latest version active Show active model remove|rm [options] [models...] Remove a model uninstall Uninstall server and delete all models node-llama-cpp|cpp [options] Node llama.cpp CLI - recompile node-llama-cpp binaries help [command] display help for command Usage: cli install|i [options] [models...] Install any GGUF model Arguments: models Model name/url/path Options: -t --tag [tag] The name of the model in local directory -l --latest Install the latest version of a model (may be unstable) -b --bind [bind] The model binding method -bk --bind-key [key] key/cookie that the binding requires -h, --help display help for command You can use it on Windows, Linux and Mac.
This package uses node-llama-cpp which supports the following platforms:
- darwin-x64
- darwin-arm64
- linux-x64
- linux-arm64
- linux-armv7l
- linux-ppc64le
- win32-x64-msvc
- All download data will be downloaded at
~/cataifolder by default. - The download is multi-threaded, so it may use a lot of bandwidth, but it will download faster!
There is also a simple API that you can use to ask the model questions.
constresponse=awaitfetch('http://127.0.0.1:3000/api/chat/prompt',{method: 'POST',body: JSON.stringify({prompt: 'Write me 100 words story'}),headers: {'Content-Type': 'application/json'}});constdata=awaitresponse.text();For more information, please read the API guide
You can also use the development API to interact with the model.
import{createChat,downloadModel,initCatAILlama,LlamaJsonSchemaGrammar}from"catai";// skip downloading the model if you already have itawaitdownloadModel("qwen3-4b-q4_k_m");constllama=awaitinitCatAILlama();constchat=awaitcreateChat({model: "qwen3-4b-q4_k_m"});constfullResponse=awaitchat.prompt("Give me array of random numbers (10 numbers)",{grammar: newLlamaJsonSchemaGrammar(llama,{type: "array",items: {type: "number",minimum: 0,maximum: 100},}),topP: 0.8,temperature: 0.8,});console.log(fullResponse);// [10, 2, 3, 4, 6, 9, 8, 1, 7, 5](For the full list of model, run catai models)
You can use the model with node-llama-cpp@beta
Catai enables you to easily manage the models and chat with them.
import{downloadModel,getModelPath,initCatAILlama,LlamaChatSession}from'catai';// download the model, skip if you already have the modelawaitdownloadModel("https://huggingface.co/giladgd/Qwen3-Reranker-4B-GGUF/resolve/main/Qwen3-Reranker-4B.Q3_K_M.gguf?download=true","qwen3-reranker-4b");// get the model path with cataiconstmodelPath=getModelPath("qwen3-reranker-4b");constllama=awaitinitCatAILlama();constmodel=awaitllama.loadModel({ modelPath });constcontext=awaitmodel.createContext();constsession=newLlamaChatSession({contextSequence: context.getSequence()});consta1=awaitsession.prompt("Hi there, how are you?");console.log("AI: "+a1);You can edit the configuration via the web ui.
More information here
Contributions are welcome!
Please read our contributing guide to get started.
This project uses Llama.cpp to run models on your computer. So any license applied to Llama.cpp is also applied to this project.


