Skip to content

QuasarByte/llama-cpp-jna

Repository files navigation

llama-cpp-jna

Java Native Access (JNA) wrapper for llama.cpp, providing Java bindings to run Large Language Models locally with high performance.

Features

  • Direct JNA bindings to llama.cpp native libraries
  • Multi-module Maven structure with Java 8 compatibility
  • CUDA acceleration support for GPU inference
  • Cross-platform compatibility (Windows, Linux, macOS)
  • High-level and low-level API options for different use cases
  • Example implementations including SimpleChat interactive demo

Quick Start

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/your-org/llama-cpp-jna.git cd llama-cpp-jna
  2. Download llama.cpp binaries from https://github.com/ggml-org/llama.cpp/releases/tag/b6527

  3. Setup binaries (see Binary Setup section below)

  4. Download a model (see Model Setup section below)

  5. Run the example:

    run-simple-chat.cmd # Windows ./run-simple-chat.sh # Linux/macOS (coming soon)

Binary Setup

Basic Setup (CPU Only)

Extract the llama.cpp binaries to C:\opt\llama.cpp-b6527-bin (Windows) or /opt/llama.cpp-b6527-bin (Linux/macOS).

CUDA Setup (GPU Acceleration)

For CUDA acceleration support, you need files from both archives:

  1. Download and extractllama-b6527-bin-win-cuda-12.4-x64.zip to C:\opt\llama.cpp-b6527-bin\
  2. Download and extractcudart-llama-bin-win-cuda-12.4-x64.zip and copy these CUDA runtime files to the same directory:
    • cublas64_12.dll
    • cublasLt64_12.dll
    • cudart64_12.dll

Important: Both archives must be extracted to the same directory for CUDA compatibility.

Model Setup

Download Models

Visit the GGML Models collection for available models.

Example - Qwen3 8B Model:

  1. Go to https://huggingface.co/ggml-org/Qwen3-8B-GGUF
  2. Download Qwen3-8B-Q8_0.gguf
  3. Save to C:\opt\models\Qwen3-8B-Q8_0.gguf (Windows) or /opt/models/Qwen3-8B-Q8_0.gguf (Linux/macOS)

Running Examples

Command Line (Recommended)

Windows:

# Option 1: Direct execution (compiles and runs) run-simple-chat.cmd # Option 2: Using Maven run-simple-chat-with-maven.cmd

Linux/macOS:

# Coming soon - bash scripts in development ./run-simple-chat.sh

IDE Setup (IntelliJ IDEA)

  1. Configure environment variables in llama-cpp-bin.env:

    PATH=%PATH%;C:\opt\llama.cpp-b6527-bin GGML_BACKEND_PATH=C:\opt\llama.cpp-b6527-bin 
  2. Create run configuration:

    • Name: SimpleChat
    • Main class: com.quasarbyte.llama.cpp.jna.examples.simplechat.SimpleChat
    • Module: examples
    • Program arguments: -m C:\opt\models\Qwen3-8B-Q8_0.gguf -c 32768 -ngl 100
    • Working directory: Project root
    • Environment variables: Import from llama-cpp-bin.env

Command Line Arguments

FlagDescriptionExample
-mPath to GGUF model file-m C:\opt\models\Qwen3-8B-Q8_0.gguf
-cContext length (tokens)-c 32768
-nglGPU layers (0 for CPU-only)-ngl 100

Project Structure

llama-cpp-jna/ ├── core/ # Main JNA library bindings │ └── src/main/java/com/quasarbyte/llama/cpp/jna/ │ ├── library/declaration/ # Native library interfaces │ │ ├── llama/ # Core llama.cpp bindings │ │ ├── ggml/ # GGML backend bindings │ │ └── cuda/ # CUDA acceleration bindings │ ├── bindings/ # High-level bindings layer │ └── model/ # Data models and DTOs ├── examples/ # Usage examples │ └── src/main/java/com/quasarbyte/llama/cpp/jna/examples/ │ ├── simple/ # Basic usage │ ├── simplechat/ # Interactive chat │ └── cuda/ # CUDA utilities ├── run-simple-chat.cmd # Windows execution script ├── run-simple-chat-with-maven.cmd # Windows Maven execution └── llama-cpp-bin.env # Environment configuration 

Building from Source

# Full build with tests mvn clean install # Quick build (skip tests) mvn clean install -DskipTests # Build specific module mvn clean install -pl core # Copy dependencies for examples mvn dependency:copy-dependencies -DoutputDirectory=examples/target/lib -pl examples

Windows Compatibility Notes

The prebuilt Windows binaries for llama.cpp (build b6527) are linked against the latest Microsoft Visual C++ Redistributable. When launching through the JVM, the Java distribution may bring its own MSVC runtime copy:

  • JDK 25+: Ships compatible DLLs that work without changes
  • JDK 8–24: Bundle older runtime versions that can cause native loading errors

Troubleshooting Runtime Issues

If using JDK 8–24, either:

  1. Upgrade to JDK 25+ (recommended)
  2. Remove/rename bundled MSVC runtime DLLs from <java.home>/bin
  3. Ensure matching Visual C++ Redistributable is installed globally

Common failure pattern:

llama.dll ├── ggml-cuda.dll │ ├── cudart64_12.dll, nvcuda.dll, cublas64_12.dll, cublasLt64_12.dll │ ├── vcruntime140.dll (from JDK bin - causes conflict) │ └── msvcp140.dll (from JDK bin - causes conflict) 

Helpful Links

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

About

Java Native Access (JNA) for llama.cpp

Resources

License

Stars

Watchers

Forks

Packages

No packages published