Skip to content

ggerganov/stable-diffusion.cpp

Repository files navigation

stable-diffusion.cpp

Inference of Stable Diffusion in pure C/C++

Features

  • Plain C/C++ implementation based on ggml, working in the same way as llama.cpp
  • 16-bit, 32-bit float support
  • 4-bit, 5-bit and 8-bit integer quantization support
  • Accelerated memory-efficient CPU inference
  • AVX, AVX2 and AVX512 support for x86 architectures
  • Original txt2img mode
  • Negative prompt
  • Sampling method
    • Euler A
  • Supported platforms
    • Linux
    • Mac OS
    • Windows

TODO

  • Original img2img mode
  • More sampling methods
  • GPU support
  • Make inference faster
    • The current implementation of ggml_conv_2d is slow and has high memory usage
  • Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
  • stable-diffusion-webui style tokenizer (eg: token weighting, ...)
  • LoRA support
  • k-quants support

Usage

Get the Code

git clone --recursive https://github.com/leejet/stable-diffusion.cpp cd stable-diffusion.cpp 

Convert weights

  • download original weights(.ckpt or .safetensors). For example

    curl -L -O https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt # curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
  • convert weights to ggml model format

    cd models pip install -r requirements.txt python convert.py [path to weights] --out_type [output precision] # For example, python convert.py sd-v1-4.ckpt --out_type f16

Quantization

You can specify the output model format using the --out_type parameter

  • f16 for 16-bit floating-point
  • f32 for 32-bit floating-point
  • q8_0 for 8-bit integer quantization
  • q5_0 or q5_1 for 5-bit integer quantization
  • q4_0 or q4_1 for 4-bit integer quantization

Build

mkdir build cd build cmake .. cmake --build . --config Release

Using OpenBLAS

cmake .. -DGGML_OPENBLAS=ON cmake --build . --config Release 

Run

usage: ./sd [arguments] arguments: -h, --help show this help message and exit -t, --threads N number of threads to use during computation (default: -1). If threads <= 0, then threads will be set to the number of CPU cores -m, --model [MODEL] path to model -o, --output OUTPUT path to write result image to (default: .\output.png) -p, --prompt [PROMPT] the prompt to render -n, --negative-prompt PROMPT the negative prompt (default: "") --cfg-scale SCALE unconditional guidance scale: (default: 7.0) -H, --height H image height, in pixel space (default: 512) -W, --width W image width, in pixel space (default: 512) --sample-method SAMPLE_METHOD sample method (default: "eular a") --steps STEPS number of sample steps (default: 20) -s SEED, --seed SEED RNG seed (default: 42, use random seed for < 0) -v, --verbose print extra info 

For example

./sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" 

Using formats of different precisions will yield results of varying quality.

f32f16q8_0q5_0q5_1q4_0q4_1

Memory/Disk Requirements

precisionf32f16q8_0q5_0q5_1q4_0q4_1
Disk2.8G2.0G1.7G1.6G1.6G1.5G1.5G
Memory(txt2img - 512 x 512)~4.9G~4.1G~3.8G~3.7G~3.7G~3.6G~3.6G

References

About

Stable Diffusion in pure C/C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++61.6%
  • C33.7%
  • Python4.5%
  • CMake0.2%