diff --git a/.github/workflows/publish-to-test.yaml b/.github/workflows/publish-to-test.yaml index ae91703..157b33a 100644 --- a/.github/workflows/publish-to-test.yaml +++ b/.github/workflows/publish-to-test.yaml @@ -40,3 +40,4 @@ jobs: with: password: ${{ secrets.TEST_PYPI_API_TOKEN }} repository-url: https://test.pypi.org/legacy/ + verbose: true diff --git a/README.md b/README.md index b826782..4d61e05 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ Simple Python bindings for **@leejet's** [`stable-diffusion.cpp`](https://github This package provides: - Low-level access to C API via `ctypes` interface. -- High-level Python API for Stable Diffusion and FLUX image generation. +- High-level Python API for Stable Diffusion, FLUX and Wan image/video generation. ## Installation @@ -93,11 +93,13 @@ CMAKE_ARGS="-DSD_CUDA=ON" pip install stable-diffusion-cpp-python
Using HIPBLAS (ROCm) -This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure you have the ROCm toolkit installed and that you replace the `-DAMDGPU_TARGETS=` value with that of your GPU architecture. -Windows users refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide and troubleshooting tips. +This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure you have the ROCm toolkit installed and that you replace the `$GFX_NAME` value with that of your GPU architecture (`gfx1030` for consumer RDNA2 cards for example).Windows users refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide and troubleshooting tips. ```bash -CMAKE_ARGS="-G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1101" pip install stable-diffusion-cpp-python +if command -v rocminfo; then export GFX_NAME=$(rocminfo | awk '/ *Name: +gfx[1-9]/ {print $2; exit}'); else echo "rocminfo missing!"; fi +if [ -z "${GFX_NAME}" ]; then echo "Error: Couldn't detect GPU!"; else echo "Building for GPU: ${GFX_NAME}"; fi + +CMAKE_ARGS="-G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DGPU_TARGETS=$GFX_NAME -DAMDGPU_TARGETS=$GFX_NAME -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON" pip install stable-diffusion-cpp-python ```
@@ -106,7 +108,7 @@ CMAKE_ARGS="-G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_
Using Metal -Using Metal makes the computation run on the GPU. Currently, there are some issues with Metal when performing operations on very large matrices, making it highly inefficient at the moment. Performance improvements are expected in the near future. +Using Metal runs the computation on Apple Silicon. Currently, there are some issues with Metal when performing operations on very large matrices, making it highly inefficient. Performance improvements are expected in the near future. ```bash CMAKE_ARGS="-DSD_METAL=ON" pip install stable-diffusion-cpp-python @@ -129,7 +131,7 @@ CMAKE_ARGS="-DSD_VULKAN=ON" pip install stable-diffusion-cpp-python
Using SYCL -Using SYCL makes the computation run on the Intel GPU. Please make sure you have installed the related driver and [Intel® oneAPI Base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) before start. More details and steps can refer to [llama.cpp SYCL backend](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md#linux). +Using SYCL runs the computation on an Intel GPU. Please make sure you have installed the related driver and [Intel® oneAPI Base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) before starting. For more details refer to [llama.cpp SYCL backend](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md#linux). ```bash # Export relevant ENV variables @@ -144,18 +146,6 @@ CMAKE_ARGS="-DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML
- -
-Using Flash Attention - -Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUDA (CUBLAS) is enabled because the kernel implementation is missing. - -```bash -CMAKE_ARGS="-DSD_FLASH_ATTN=ON" pip install stable-diffusion-cpp-python -``` - -
-
Using OpenBLAS @@ -167,7 +157,6 @@ CMAKE_ARGS="-DGGML_OPENBLAS=ON" pip install stable-diffusion-cpp-python
-
Using MUSA @@ -179,39 +168,136 @@ CMAKE_ARGS="-DCMAKE_C_COMPILER=/usr/local/musa/bin/clang -DCMAKE_CXX_COMPILER=/u
+ +
+Using OpenCL (Adreno GPU) + +Currently, it only supports Adreno GPUs and is primarily optimized for Q4_0 type. + +To build for Windows ARM please refers to [Windows 11 Arm64](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/OPENCL.md#windows-11-arm64) + +Building for Android: + +Android NDK: + +- Download and install the Android NDK from the [official Android developer site](https://developer.android.com/ndk/downloads). + +Setup OpenCL Dependencies for NDK: +You need to provide OpenCL headers and the ICD loader library to your NDK sysroot. + +- OpenCL Headers: + + ```bash + # In a temporary working directory + git clone https://github.com/KhronosGroup/OpenCL-Headers + cd OpenCL-Headers + + # Replace with your actual NDK installation path + # e.g., cp -r CL /path/to/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include + sudo cp -r CL /toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include + cd .. + ``` + +- OpenCL ICD Loader: + + ```bash + # In the same temporary working directory + git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader + cd OpenCL-ICD-Loader + mkdir build_ndk && cd build_ndk + + # Replace in the CMAKE_TOOLCHAIN_FILE and OPENCL_ICD_LOADER_HEADERS_DIR + cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \ + -DCMAKE_TOOLCHAIN_FILE=/build/cmake/android.toolchain.cmake \ + -DOPENCL_ICD_LOADER_HEADERS_DIR=/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \ + -DANDROID_ABI=arm64-v8a \ + -DANDROID_PLATFORM=24 \ + -DANDROID_STL=c++_shared + + ninja + # Replace + # e.g., cp libOpenCL.so /path/to/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android + sudo cp libOpenCL.so /toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android + cd ../.. + ``` + +Build `stable-diffusion-cpp-python` for Android with (untested): + +```bash +# Replace with your actual NDK installation path +# e.g., -DCMAKE_TOOLCHAIN_FILE=/path/to/android-ndk-r26c/build/cmake/android.toolchain.cmake +CMAKE_ARGS="-G Ninja -DCMAKE_TOOLCHAIN_FILE=/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DGGML_OPENMP=OFF -DSD_OPENCL=ON +``` + +_(Note: Don't forget to include `LD_LIBRARY_PATH=/vendor/lib64` in your command line before running the binary)_ + +
+ ### Upgrading and Reinstalling To upgrade and rebuild `stable-diffusion-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source. +### Using Flash Attention + +Enabling flash attention for the diffusion model reduces memory usage by varying amounts of MB, e.g.: + +- **flux 768x768** ~600mb +- **SD2 768x768** ~1400mb + +For most backends, it slows things down, but for cuda it generally speeds it up too. +At the moment, it is only supported for some models and some backends (like `cpu`, `cuda/rocm` and `metal`). + +Run by passing `diffusion_flash_attn=True` to the `StableDiffusion` class and watch for: + +```log +[INFO] stable-diffusion.cpp:312 - Using flash attention in the diffusion model +``` + +and the compute buffer shrink in the debug log: + +```log +[DEBUG] ggml_extend.hpp:1004 - flux compute buffer size: 650.00 MB(VRAM) +``` + ## High-level API The high-level API provides a simple managed interface through the `StableDiffusion` class. Below is a short example demonstrating how to use the high-level API to generate a simple image: -### Text to Image +### Text to Image ```python +from PIL import Image from stable_diffusion_cpp import StableDiffusion -def callback(step: int, steps: int, time: float): +def progress_callback(step: int, steps: int, time: float): print("Completed step: {} of {}".format(step, steps)) +def preview_callback(step: int, images: list[Image.Image], is_noisy: bool): + images[0].save(f"preview/{step}.png") + stable_diffusion = StableDiffusion( model_path="../models/v1-5-pruned-emaonly.safetensors", # wtype="default", # Weight type (e.g. "q8_0", "f16", etc) (The "default" setting is automatically applied and determines the weight type of a model file) ) -output = stable_diffusion.txt_to_img( +output = stable_diffusion.generate_image( prompt="a lovely cat", - width=512, # Must be a multiple of 64 - height=512, # Must be a multiple of 64 - progress_callback=callback, + width=512, + height=512, + progress_callback=progress_callback, # seed=1337, # Uncomment to set a specific seed (use -1 for a random seed) + preview_method="proj", + preview_interval=2, # Call every 2 steps + preview_callback=preview_callback, ) output[0].save("output.png") # Output returned as list of PIL Images + +# Model and generation paramaters accessible via .info +print(output[0].info) ``` -#### With LoRA (Stable Diffusion) +#### With LoRA (Stable Diffusion) You can specify the directory where the lora weights are stored via `lora_model_dir`. If not specified, the default is the current working directory. @@ -227,16 +313,18 @@ stable_diffusion = StableDiffusion( model_path="../models/v1-5-pruned-emaonly.safetensors", lora_model_dir="../models/", # This should point to folder where LoRA weights are stored (not an individual file) ) -output = stable_diffusion.txt_to_img( +output = stable_diffusion.generate_image( prompt="a lovely cat", ) ``` - The `lora_model_dir` argument is used in the same way for FLUX image generation. -### FLUX Image Generation +--- + +### FLUX Image Generation -FLUX models should be run using the same implementation as the [stable-diffusion.cpp FLUX documentation](https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/flux.md) where the `diffusion_model_path` argument is used in place of the `model_path`. The `clip_l_path`, `t5xxl_path`, and `vae_path` arguments are also required for inference to function. +FLUX models should be run using the same implementation as the [stable-diffusion.cpp FLUX documentation](https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/flux.md) where the `diffusion_model_path` argument is used in place of the `model_path`. The `clip_l_path`, `t5xxl_path`, and `vae_path` arguments are also required for inference to function (for most models). Download the weights from the links below: @@ -253,34 +341,129 @@ stable_diffusion = StableDiffusion( clip_l_path="../models/clip_l.safetensors", t5xxl_path="../models/t5xxl_fp16.safetensors", vae_path="../models/ae.safetensors", - vae_decode_only=True, # Can be True if we dont use img_to_img + vae_decode_only=True, # Can be True if not generating image to image + keep_clip_on_cpu=True, # Prevents black images when using some T5 models ) -output = stable_diffusion.txt_to_img( +output = stable_diffusion.generate_image( prompt="a lovely cat holding a sign says 'flux.cpp'", - sample_steps=4, cfg_scale=1.0, # a cfg_scale of 1 is recommended for FLUX - sample_method="euler", # euler is recommended for FLUX + # sample_method="euler", # euler is recommended for FLUX, set automatically if "default" is specified ) ``` -#### With LoRA (FLUX) +#### FLUX.2 + +Download the weights from the links below: + +- Download `FLUX.2-dev` + - gguf: https://huggingface.co/city96/FLUX.2-dev-gguf/tree/main +- Download `vae` + - safetensors: https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/main +- Download `Mistral-Small-3.2-24B-Instruct-2506-GGUF` + - gguf: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF/tree/main + +```python +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/flux2-dev-Q4_K_M.gguf", + llm_path="../models/Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf", + vae_path="../models/ae.safetensors", + offload_params_to_cpu=True, + diffusion_flash_attn=True, +) + +output = stable_diffusion.generate_image( + prompt="the cat has a hat", + ref_images=["input.png"], + sample_steps=4, + cfg_scale=1.0, +) +``` + +#### With LoRA (FLUX) LoRAs can be used with FLUX models in the same way as Stable Diffusion models ([as shown above](#with-lora-stable-diffusion)). Note that: - It is recommended you use LoRAs with naming formats compatible with ComfyUI. -- LoRAs will only work with Flux-dev q8_0. +- LoRAs will only work with `Flux-dev q8_0`. - You can download FLUX LoRA models from https://huggingface.co/XLabs-AI/flux-lora-collection/tree/main (you must use a comfy converted version!!!). -### SD3.5 Image Generation +#### Kontext (FLUX) Download the weights from the links below: -- Download sd3.5_large from https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/sd3.5_large.safetensors -- Download clip_g from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_g.safetensors -- Download clip_l from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_l.safetensors -- Download t5xxl from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/t5xxl_fp16.safetensors +- Preconverted gguf model from [FLUX.1-Kontext-dev-GGUF](https://huggingface.co/QuantStack/FLUX.1-Kontext-dev-GGUF) +- Otherwise, download FLUX.1-Kontext-dev from [black-forest-labs/FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/flux1-kontext-dev.safetensors) +- The `vae`, `clip_l`, and `t5xxl` models are the same as for FLUX image generation linked above. + +```python +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/flux1-kontext-dev-Q5_K_S.gguf", # In place of model_path + clip_l_path="../models/clip_l.safetensors", + t5xxl_path="../models/t5xxl_fp16.safetensors", + vae_path="../models/ae.safetensors", + vae_decode_only=False, # Must be False for FLUX Kontext + keep_clip_on_cpu=True, # Prevents black images when using some T5 models +) +output = stable_diffusion.generate_image( + prompt="make the cat blue", + ref_images=["input.png"], + cfg_scale=1.0, # a cfg_scale of 1 is recommended for FLUX +) +``` + +#### Chroma (FLUX) + +Download the weights from the links below: + +- Preconverted gguf model from [silveroxides/Chroma1-Flash-GGUF](https://huggingface.co/silveroxides/Chroma1-Flash-GGUF), [silveroxides/Chroma1-Base-GGUF](https://huggingface.co/silveroxides/Chroma1-Base-GGUF) or [silveroxides/Chroma1-HD-GGUF](https://huggingface.co/silveroxides/Chroma1-HD-GGUF) ([silveroxides/Chroma-GGUF](https://huggingface.co/silveroxides/Chroma-GGUF) is DEPRECATED) +- Otherwise, download chroma's safetensors from [lodestones/Chroma1-Flash](https://huggingface.co/lodestones/Chroma1-Flash), [lodestones/Chroma1-Base](https://huggingface.co/lodestones/Chroma1-Base) or [lodestones/Chroma1-HD](https://huggingface.co/lodestones/Chroma1-HD) ([lodestones/Chroma](https://huggingface.co/lodestones/Chroma) is DEPRECATED) +- The `vae` and `t5xxl` models are the same as for FLUX image generation linked above (`clip_l` not required). + +or Chroma Radiance models from: + +- safetensors: https://huggingface.co/lodestones/Chroma1-Radiance/tree/main +- gguf: https://huggingface.co/silveroxides/Chroma1-Radiance-GGUF/tree/main +- t5xxl: https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp16.safetensors + +```python +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/Chroma1-HD-Flash-Q4_0.gguf", # In place of model_path + t5xxl_path="../models/t5xxl_fp16.safetensors", + vae_path="../models/ae.safetensors", + vae_decode_only=True, # Can be True if we are not generating image to image + chroma_use_dit_mask=False, + keep_clip_on_cpu=True, # Prevents black images when using some T5 models +) +output = stable_diffusion.generate_image( + prompt="a lovely cat holding a sign says 'chroma.cpp'", + cfg_scale=4.0, # a cfg_scale of 4 is recommended for Chroma +) +``` + +--- + +### Some SD1.x and SDXL distilled models + +See [docs/distilled_sd.md](./docs/distilled_sd.md) for instructions on using distilled SD models. + +--- + +### SD3.5 Image Generation + +Download the weights from the links below: + +- Download `sd3.5_large` from https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/sd3.5_large.safetensors +- Download `clip_g` from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_g.safetensors +- Download `clip_l` from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_l.safetensors +- Download `t5xxl` from https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/t5xxl_fp16.safetensors ```python from stable_diffusion_cpp import StableDiffusion @@ -290,8 +473,9 @@ stable_diffusion = StableDiffusion( clip_l_path="../models/clip_l.safetensors", clip_g_path="../models/clip_g.safetensors", t5xxl_path="../models/t5xxl_fp16.safetensors", + keep_clip_on_cpu=True, # Prevents black images when using some T5 models ) -output = stable_diffusion.txt_to_img( +output = stable_diffusion.generate_image( prompt="a lovely cat holding a sign says 'Stable diffusion 3.5 Large'", height=1024, width=1024, @@ -300,24 +484,29 @@ output = stable_diffusion.txt_to_img( ) ``` -### Image to Image +--- + +### Image to Image ```python from stable_diffusion_cpp import StableDiffusion +# from PIL import Image INPUT_IMAGE = "../input.png" # INPUT_IMAGE = Image.open("../input.png") # or alternatively, pass as PIL Image stable_diffusion = StableDiffusion(model_path="../models/v1-5-pruned-emaonly.safetensors") -output = stable_diffusion.img_to_img( +output = stable_diffusion.generate_image( prompt="blue eyes", - image=INPUT_IMAGE, # Note: The input image will be automatically resized to the match the width and height arguments (default: 512x512) + init_image=INPUT_IMAGE, # Note: The input image will be automatically resized to the match the width and height arguments (default: 512x512) strength=0.4, ) ``` -### Inpainting +--- + +### Inpainting ```python from stable_diffusion_cpp import StableDiffusion @@ -325,81 +514,336 @@ from stable_diffusion_cpp import StableDiffusion # Note: Inpainting with a base model gives poor results. A model fine-tuned for inpainting is recommended. stable_diffusion = StableDiffusion(model_path="../models/v1-5-pruned-emaonly.safetensors") -output = stable_diffusion.img_to_img( +output = stable_diffusion.generate_image( prompt="blue eyes", - image="../input.png", + init_image="../input.png", mask_image="../mask.png", # A grayscale image where 0 is masked and 255 is unmasked strength=0.4, ) ``` -### PhotoMaker +--- + +### PhotoMaker You can use [PhotoMaker](https://github.com/TencentARC/PhotoMaker) to personalize generated images with your own ID. **NOTE**, currently PhotoMaker **ONLY** works with **SDXL** (any SDXL model files will work). -The VAE in SDXL encounters NaN issues. You can find a fixed VAE here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors). Download PhotoMaker model file (in safetensor format) [here](https://huggingface.co/bssrdf/PhotoMaker). The official release of the model file (in .bin format) does not work with `stablediffusion.cpp`. In prompt, make sure you have a class word followed by the trigger word `"img"` (hard-coded for now). The class word could be one of `"man, woman, girl, boy"`. If input ID images contain asian faces, add `Asian` before the class word. ```python +import os from stable_diffusion_cpp import StableDiffusion stable_diffusion = StableDiffusion( model_path="../models/sdxl.vae.safetensors", vae_path="../models/sdxl.vae.safetensors", - stacked_id_embed_dir="../models/photomaker-v1.safetensors", + photo_maker_path="../models/photomaker-v1.safetensors", # keep_vae_on_cpu=True, # If on low memory GPUs (<= 8GB), setting this to True is recommended to get artifact free images ) -output = stable_diffusion.txt_to_img( +INPUT_ID_IMAGES_DIR = "../assets/newton_man" + +output = stable_diffusion.generate_image( cfg_scale=5.0, # a cfg_scale of 5.0 is recommended for PhotoMaker height=1024, width=1024, - style_strength=10, # (0-100)% Default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality). + pm_style_strength=10, # (0-100)% Default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality). sample_method="euler", prompt="a man img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed", negative_prompt="realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text", - input_id_images_path="../assets/newton_man", + pm_id_images=[ + os.path.join(INPUT_ID_IMAGES_DIR, f) + for f in os.listdir(INPUT_ID_IMAGES_DIR) + if f.lower().endswith((".png", ".jpg", ".jpeg", ".bmp")) + ], ) ``` -### PhotoMaker Version 2 +#### PhotoMaker Version 2 -[PhotoMaker Version 2 (PMV2)](https://github.com/TencentARC/PhotoMaker/blob/main/README_pmv2.md) has some key improvements. Unfortunately it has a very heavy dependency which makes running it a bit involved in `SD.cpp`. +[PhotoMaker Version 2 (PMV2)](https://github.com/TencentARC/PhotoMaker/blob/main/README_pmv2.md) has some key improvements. Unfortunately it has a very heavy dependency which makes running it a bit involved. -Running PMV2 Requires running a python script `face_detect.py` (found [here](https://github.com/leejet/stable-diffusion.cpp/blob/master/face_detect.py)) to obtain **id_embeds** for the given input images. +Running PMV2 Requires running a python script `face_detect.py` (found here [stable-diffusion.cpp/face_detect.py](https://github.com/leejet/stable-diffusion.cpp/blob/master/face_detect.py)) to obtain `id_embeds` for the given input images. -``` +```bash python face_detect.py ``` -An `id_embeds.safetensors` file will be generated in `input_images_dir`. +An `id_embeds.bin` file will be generated in `input_images_dir`. + +**Note: This step only needs to be run once — the resulting `id_embeds` can be reused.** + +- Run the same command as in version 1 but replacing `photomaker-v1.safetensors` with `photomaker-v2.safetensors` and pass the `id_embeds.bin` path into the `pm_id_embed_path` parameter. + Download `photomaker-v2.safetensors` from [bssrdf/PhotoMakerV2](https://huggingface.co/bssrdf/PhotoMakerV2). +- All other parameters from Version 1 remain the same for Version 2. + +--- + +### QWEN Image + +Download the weights from the links below: + +- Download `Qwen Image` + - safetensors: https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/QuantStack/Qwen-Image-GGUF/tree/main +- Download `vae` + - safetensors: https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/vae +- Download `qwen_2.5_vl 7b` + - safetensors: https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/text_encoders + - gguf: https://huggingface.co/mradermacher/Qwen2.5-VL-7B-Instruct-GGUF/tree/main + +```python +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/qwen-image-Q8_0.gguf", + llm_path="../models/Qwen2.5-VL-7B-Instruct.Q8_0.gguf", + vae_path="../models/qwen_image_vae.safetensors", + offload_params_to_cpu=True, + flow_shift=3, +) + +output = stable_diffusion.generate_image( + prompt='一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 “一、Qwen-Image的技术路线: 探索视觉生成基础模型的极限,开创理解与生成一体化的未来。二、Qwen-Image的模型特色:1、复杂文字渲染。支持中英渲染、自动布局; 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景:赋能专业内容创作、助力生成式AI发展。”', + cfg_scale=2.5, + sample_method='euler', +) +``` + +#### QWEN Image Edit + +Download the weights from the links below: + +- Download `Qwen Image Edit` + - Qwen Image Edit + - safetensors: https://huggingface.co/Comfy-Org/Qwen-Image-Edit_ComfyUI/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/QuantStack/Qwen-Image-Edit-GGUF/tree/main + - Qwen Image Edit 2509 + - safetensors: https://huggingface.co/Comfy-Org/Qwen-Image-Edit_ComfyUI/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/QuantStack/Qwen-Image-Edit-2509-GGUF/tree/main +- Download `vae` + - safetensors: https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/vae +- Download `qwen_2.5_vl 7b` + - safetensors: https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/split_files/text_encoders + - gguf: https://huggingface.co/mradermacher/Qwen2.5-VL-7B-Instruct-GGUF/tree/main + +```python +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/Qwen_Image_Edit-Q8_0.gguf", + llm_path="../models/Qwen2.5-VL-7B-Instruct.Q8_0.gguf", + vae_path="../models/qwen_image_vae.safetensors", + offload_params_to_cpu=True, + flow_shift=3, +) + +output = stable_diffusion.generate_image( + prompt="make the cat blue", + ref_images=["input.png"], + cfg_scale=2.5, + sample_method='euler', +) +``` + +--- -**Note: this step is only needed to run once; the same `id_embeds` can be reused** +### Z-Image -- Run the same command as in version 1 but replacing `photomaker-v1.safetensors` with `photomaker-v2.safetensors`. +Download the weights from the links below: - You can download `photomaker-v2.safetensors` from [here](https://huggingface.co/bssrdf/PhotoMakerV2). +- Download `Z-Image-Turbo` + - safetensors: https://huggingface.co/Comfy-Org/z_image_turbo/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/leejet/Z-Image-Turbo-GGUF/tree/main +- Download `vae` + - safetensors: https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main +- Download `Qwen3 4b` + - safetensors: https://huggingface.co/Comfy-Org/z_image_turbo/tree/main/split_files/text_encoders + - gguf: https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF/tree/main -- All the other parameters from Version 1 remain the same for Version 2. +```python +from stable_diffusion_cpp import StableDiffusion -### Listing GGML model and RNG types, schedulers and sample methods +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/z_image_turbo-Q3_K.gguf", + llm_path="../models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf", + vae_path="../models/ae.safetensors", + offload_params_to_cpu=True, + diffusion_flash_attn=True, +) + +output = stable_diffusion.generate_image( + prompt="A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic", + height=1024, + width=512, + cfg_scale=1.0, +) +``` + +--- -Access the GGML model and RNG types, schedulers, and sample methods via the following maps: +### Ovis + +Download the weights from the links below: + +- Download `Ovis-Image-7B` + - safetensors: https://huggingface.co/Comfy-Org/Ovis-Image/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/leejet/Ovis-Image-7B-GGUF +- Download `vae` + - safetensors: https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main +- Download `Ovis 2.5` + - safetensors: https://huggingface.co/Comfy-Org/Ovis-Image/tree/main/split_files/text_encoders ```python -from stable_diffusion_cpp import GGML_TYPE_MAP, RNG_TYPE_MAP, SCHEDULE_MAP, SAMPLE_METHOD_MAP +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/ovis_image-Q4_0.gguf", + llm_path="../models/ovis_2.5.safetensors", + vae_path="../models/ae.safetensors", + diffusion_flash_attn=True, +) + +output = stable_diffusion.generate_image( + prompt="a lovely cat", + cfg_scale=5.0, +) +``` + +--- + +### Wan Video Generation + +See [stable-diffusion.cpp Wan download weights](https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/wan.md#download-weights) for a complete list of Wan models. + +```python +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion( + diffusion_model_path="../models/wan2.1_t2v_1.3B_fp16.safetensors", # In place of model_path + t5xxl_path="../models/umt5-xxl-encoder-Q8_0.gguf", + vae_path="../models/wan_2.1_vae.safetensors", + flow_shift=3.0, + keep_clip_on_cpu=True, # Prevents black images when using some T5 models +) + +output = stable_diffusion.generate_video( + prompt="a cute dog jumping", + negative_prompt="色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部, 畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走", + height=832, + width=480, + cfg_scale=6.0, + sample_method="euler", + video_frames=33, +) # Output is a list of PIL Images (video frames) +``` + +As the output is simply a list of images (video frames), you can convert it into a video using any library you prefer. The example below uses `ffmpeg-python`. Alternatively, libraries such **OpenCV** or **MoviePy** can also be used. + +> **Note** +> +> - You'll require **Python bindings for FFmpeg**, `python-ffmpeg` (`pip install ffmpeg-python`) in addition to an **FFmpeg installation on your system**, accessible in your PATH. Check with `ffmpeg -version`. + +```python +from typing import List +from PIL import Image +import numpy as np +import ffmpeg + +def save_video_ffmpeg(frames: List[Image.Image], fps: int, out_path: str) -> None: + if not frames: + raise ValueError("No frames provided") + + width, height = frames[0].size + + # Concatenate frames into raw RGB bytes + raw_bytes = b"".join(np.array(frame.convert("RGB"), dtype=np.uint8).tobytes() for frame in frames) + ( + ffmpeg.input( + "pipe:", + format="rawvideo", + pix_fmt="rgb24", + s=f"{width}x{height}", + r=fps, + ) + .output( + out_path, + vcodec="libx264", + pix_fmt="yuv420p", + r=fps, + movflags="+faststart", + ) + .overwrite_output() + .run(input=raw_bytes) + ) + +save_video_ffmpeg(output, fps=16, out_path="output.mp4") +``` + +#### Wan VACE + +Use FFmpeg to extract frames from a video to use as control frames for Wan VACE. + +```bash +mkdir assets/frames +ffmpeg -i assets/test.mp4 -qscale:v 1 -vf fps=8 assets/frames/frame_%04d.jpg +``` + +```python +output = stable_diffusion.generate_video( + ... + # Add control frames for VACE (PIL Images or file paths) + control_frames=[ + os.path.join('assets/frames', f) + for f in os.listdir('assets/frames') + if f.lower().endswith((".png", ".jpg", ".jpeg", ".bmp")) + ], +) +``` + +--- + +### GGUF Model Conversion + +You can convert models to GGUF format using the `convert` method. + +```python +from stable_diffusion_cpp import StableDiffusion + +stable_diffusion = StableDiffusion() + +stable_diffusion.convert( + input_path="../models/v1-5-pruned-emaonly.safetensors", + output_path="new_model.gguf", + output_type="q8_0", +) +``` + +--- + +### Listing LoRA apply modes, GGML model/prediction/RNG types, sample/preview methods and schedulers + +Access the LoRA apply modes, GGML model/prediction/RNG types, sample/preview methods and schedulers via the following maps: + +```python +from stable_diffusion_cpp import GGML_TYPE_MAP, RNG_TYPE_MAP, SCHEDULER_MAP, SAMPLE_METHOD_MAP, PREDICTION_MAP, PREVIEW_MAP, LORA_APPLY_MODE_MAP print("GGML model types:", list(GGML_TYPE_MAP)) print("RNG types:", list(RNG_TYPE_MAP)) -print("Schedulers:", list(SCHEDULE_MAP)) +print("Schedulers:", list(SCHEDULER_MAP)) print("Sample methods:", list(SAMPLE_METHOD_MAP)) +print("Prediction types:", list(PREDICTION_MAP)) +print("Preview methods:", list(PREVIEW_MAP)) +print("LoRA apply modes:", list(LORA_APPLY_MODE_MAP)) ``` -### Other High-level API Examples +--- + +### Other High-level API Examples Other examples for the high-level API (such as upscaling and model conversion) can be found in the [tests](tests) directory. @@ -408,7 +852,7 @@ Other examples for the high-level API (such as upscaling and model conversion) c The low-level API is a direct [`ctypes`](https://docs.python.org/3/library/ctypes.html) binding to the C API provided by `stable-diffusion.cpp`. The entire low-level API can be found in [stable_diffusion_cpp/stable_diffusion_cpp.py](https://github.com/william-murray1204/stable-diffusion-cpp-python/blob/main/stable_diffusion_cpp/stable_diffusion_cpp.py) and directly mirrors the C API in [stable-diffusion.h](https://github.com/leejet/stable-diffusion.cpp/blob/master/stable-diffusion.h). -Below is a short example demonstrating how to use the low-level API: +Below is a short example demonstrating low-level API usage: ```python import stable_diffusion_cpp as sd_cpp @@ -427,12 +871,6 @@ c_image = sd_cpp.sd_image_t( ctypes.POINTER(ctypes.c_uint8), ), ) # Create a new C sd_image_t - -img = sd_cpp.upscale( - self.upscaler, - image_bytes, - upscale_factor, -) # Upscale the image ``` ## Development @@ -456,7 +894,7 @@ Now you can make changes to the code within the `stable_diffusion_cpp` directory - [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) -- [llama.cpp](https://github.com/ggerganov/llama.cpp) +- [llama.cpp](https://github.com/ggml-org/llama.cpp) - [whisper-cpp-python](https://github.com/carloscdias/whisper-cpp-python) - [Golang stable-diffusion](https://github.com/seasonjs/stable-diffusion) - [StableDiffusion.NET](https://github.com/DarthAffe/StableDiffusion.NET) diff --git a/assets/box.png b/assets/box.png new file mode 100644 index 0000000..aad12cc Binary files /dev/null and b/assets/box.png differ diff --git a/assets/frames/frame_0001.jpg b/assets/frames/frame_0001.jpg new file mode 100644 index 0000000..92f150b Binary files /dev/null and b/assets/frames/frame_0001.jpg differ diff --git a/assets/frames/frame_0002.jpg b/assets/frames/frame_0002.jpg new file mode 100644 index 0000000..0e844bb Binary files /dev/null and b/assets/frames/frame_0002.jpg differ diff --git a/assets/frames/frame_0003.jpg b/assets/frames/frame_0003.jpg new file mode 100644 index 0000000..0f30f3c Binary files /dev/null and b/assets/frames/frame_0003.jpg differ diff --git a/assets/frames/frame_0004.jpg b/assets/frames/frame_0004.jpg new file mode 100644 index 0000000..c41dd32 Binary files /dev/null and b/assets/frames/frame_0004.jpg differ diff --git a/assets/frames/frame_0005.jpg b/assets/frames/frame_0005.jpg new file mode 100644 index 0000000..bd5a22c Binary files /dev/null and b/assets/frames/frame_0005.jpg differ diff --git a/assets/frames/frame_0006.jpg b/assets/frames/frame_0006.jpg new file mode 100644 index 0000000..f7b01ed Binary files /dev/null and b/assets/frames/frame_0006.jpg differ diff --git a/assets/frames/frame_0007.jpg b/assets/frames/frame_0007.jpg new file mode 100644 index 0000000..115b312 Binary files /dev/null and b/assets/frames/frame_0007.jpg differ diff --git a/assets/frames/frame_0008.jpg b/assets/frames/frame_0008.jpg new file mode 100644 index 0000000..f9a833a Binary files /dev/null and b/assets/frames/frame_0008.jpg differ diff --git a/assets/frames/frame_0009.jpg b/assets/frames/frame_0009.jpg new file mode 100644 index 0000000..08d750b Binary files /dev/null and b/assets/frames/frame_0009.jpg differ diff --git a/assets/frames/frame_0010.jpg b/assets/frames/frame_0010.jpg new file mode 100644 index 0000000..b7f4741 Binary files /dev/null and b/assets/frames/frame_0010.jpg differ diff --git a/assets/frames/frame_0011.jpg b/assets/frames/frame_0011.jpg new file mode 100644 index 0000000..92f853b Binary files /dev/null and b/assets/frames/frame_0011.jpg differ diff --git a/assets/frames/frame_0012.jpg b/assets/frames/frame_0012.jpg new file mode 100644 index 0000000..212ad38 Binary files /dev/null and b/assets/frames/frame_0012.jpg differ diff --git a/assets/frames/frame_0013.jpg b/assets/frames/frame_0013.jpg new file mode 100644 index 0000000..2372339 Binary files /dev/null and b/assets/frames/frame_0013.jpg differ diff --git a/assets/frames/frame_0014.jpg b/assets/frames/frame_0014.jpg new file mode 100644 index 0000000..7e79699 Binary files /dev/null and b/assets/frames/frame_0014.jpg differ diff --git a/assets/frames/frame_0015.jpg b/assets/frames/frame_0015.jpg new file mode 100644 index 0000000..0a1f647 Binary files /dev/null and b/assets/frames/frame_0015.jpg differ diff --git a/assets/frames/frame_0016.jpg b/assets/frames/frame_0016.jpg new file mode 100644 index 0000000..cced4bd Binary files /dev/null and b/assets/frames/frame_0016.jpg differ diff --git a/assets/frames/frame_0017.jpg b/assets/frames/frame_0017.jpg new file mode 100644 index 0000000..4593209 Binary files /dev/null and b/assets/frames/frame_0017.jpg differ diff --git a/assets/frames/frame_0018.jpg b/assets/frames/frame_0018.jpg new file mode 100644 index 0000000..dbb64a7 Binary files /dev/null and b/assets/frames/frame_0018.jpg differ diff --git a/assets/frames/frame_0019.jpg b/assets/frames/frame_0019.jpg new file mode 100644 index 0000000..905f0ee Binary files /dev/null and b/assets/frames/frame_0019.jpg differ diff --git a/assets/frames/frame_0020.jpg b/assets/frames/frame_0020.jpg new file mode 100644 index 0000000..c96b3ee Binary files /dev/null and b/assets/frames/frame_0020.jpg differ diff --git a/assets/frames/frame_0021.jpg b/assets/frames/frame_0021.jpg new file mode 100644 index 0000000..254d93a Binary files /dev/null and b/assets/frames/frame_0021.jpg differ diff --git a/assets/frames/frame_0022.jpg b/assets/frames/frame_0022.jpg new file mode 100644 index 0000000..1a2c90f Binary files /dev/null and b/assets/frames/frame_0022.jpg differ diff --git a/assets/frames/frame_0023.jpg b/assets/frames/frame_0023.jpg new file mode 100644 index 0000000..5da1a04 Binary files /dev/null and b/assets/frames/frame_0023.jpg differ diff --git a/assets/frames/frame_0024.jpg b/assets/frames/frame_0024.jpg new file mode 100644 index 0000000..81e010a Binary files /dev/null and b/assets/frames/frame_0024.jpg differ diff --git a/assets/frames/frame_0025.jpg b/assets/frames/frame_0025.jpg new file mode 100644 index 0000000..e1562df Binary files /dev/null and b/assets/frames/frame_0025.jpg differ diff --git a/assets/frames/frame_0026.jpg b/assets/frames/frame_0026.jpg new file mode 100644 index 0000000..a198321 Binary files /dev/null and b/assets/frames/frame_0026.jpg differ diff --git a/assets/frames/frame_0027.jpg b/assets/frames/frame_0027.jpg new file mode 100644 index 0000000..edbd75b Binary files /dev/null and b/assets/frames/frame_0027.jpg differ diff --git a/assets/frames/frame_0028.jpg b/assets/frames/frame_0028.jpg new file mode 100644 index 0000000..a203964 Binary files /dev/null and b/assets/frames/frame_0028.jpg differ diff --git a/assets/frames/frame_0029.jpg b/assets/frames/frame_0029.jpg new file mode 100644 index 0000000..031ade8 Binary files /dev/null and b/assets/frames/frame_0029.jpg differ diff --git a/assets/frames/frame_0030.jpg b/assets/frames/frame_0030.jpg new file mode 100644 index 0000000..d346636 Binary files /dev/null and b/assets/frames/frame_0030.jpg differ diff --git a/assets/frames/frame_0031.jpg b/assets/frames/frame_0031.jpg new file mode 100644 index 0000000..87d03e9 Binary files /dev/null and b/assets/frames/frame_0031.jpg differ diff --git a/assets/frames/frame_0032.jpg b/assets/frames/frame_0032.jpg new file mode 100644 index 0000000..dd0a12e Binary files /dev/null and b/assets/frames/frame_0032.jpg differ diff --git a/assets/frames/frame_0033.jpg b/assets/frames/frame_0033.jpg new file mode 100644 index 0000000..026b6d7 Binary files /dev/null and b/assets/frames/frame_0033.jpg differ diff --git a/assets/frames/frame_0034.jpg b/assets/frames/frame_0034.jpg new file mode 100644 index 0000000..8d80934 Binary files /dev/null and b/assets/frames/frame_0034.jpg differ diff --git a/assets/frames/frame_0035.jpg b/assets/frames/frame_0035.jpg new file mode 100644 index 0000000..e7d2539 Binary files /dev/null and b/assets/frames/frame_0035.jpg differ diff --git a/assets/frames/frame_0036.jpg b/assets/frames/frame_0036.jpg new file mode 100644 index 0000000..c5223ef Binary files /dev/null and b/assets/frames/frame_0036.jpg differ diff --git a/assets/frames/frame_0037.jpg b/assets/frames/frame_0037.jpg new file mode 100644 index 0000000..c1e32d4 Binary files /dev/null and b/assets/frames/frame_0037.jpg differ diff --git a/assets/frames/frame_0038.jpg b/assets/frames/frame_0038.jpg new file mode 100644 index 0000000..3494c3a Binary files /dev/null and b/assets/frames/frame_0038.jpg differ diff --git a/assets/frames/frame_0039.jpg b/assets/frames/frame_0039.jpg new file mode 100644 index 0000000..d71a3ab Binary files /dev/null and b/assets/frames/frame_0039.jpg differ diff --git a/assets/frames/frame_0040.jpg b/assets/frames/frame_0040.jpg new file mode 100644 index 0000000..1803798 Binary files /dev/null and b/assets/frames/frame_0040.jpg differ diff --git a/assets/frames/frame_0041.jpg b/assets/frames/frame_0041.jpg new file mode 100644 index 0000000..3c7bfe2 Binary files /dev/null and b/assets/frames/frame_0041.jpg differ diff --git a/assets/newton_man/id_embeds.bin b/assets/newton_man/id_embeds.bin new file mode 100644 index 0000000..a202dc7 Binary files /dev/null and b/assets/newton_man/id_embeds.bin differ diff --git a/assets/test.mp4 b/assets/test.mp4 new file mode 100644 index 0000000..808188e Binary files /dev/null and b/assets/test.mp4 differ diff --git a/docs/distilled_sd.md b/docs/distilled_sd.md new file mode 100644 index 0000000..478305f --- /dev/null +++ b/docs/distilled_sd.md @@ -0,0 +1,99 @@ +# Running distilled models: SSD1B and SDx.x with tiny U-Nets + +## Preface + +These models feature a reduced U-Net architecture. Unlike standard SDXL models, the SSD-1B U-Net contains only one middle block and fewer attention layers in its up- and down-blocks, resulting in significantly smaller file sizes. Using these models can reduce inference time by more than 33%. For more details, refer to Segmind's paper: https://arxiv.org/abs/2401.02677v1. +Similarly, SD1.x- and SD2.x-style models with a tiny U-Net consist of only 6 U-Net blocks, leading to very small files and time savings of up to 50%. For more information, see the paper: https://arxiv.org/pdf/2305.15798.pdf. + +## SSD1B + +Note that not all of these models follow the standard parameter naming conventions. However, several useful SSD-1B models are available online, such as: + + * https://huggingface.co/segmind/SSD-1B/resolve/main/SSD-1B-A1111.safetensors + * https://huggingface.co/hassenhamdi/SSD-1B-fp8_e4m3fn/resolve/main/SSD-1B_fp8_e4m3fn.safetensors + +Useful LoRAs are also available: + + * https://huggingface.co/seungminh/lora-swarovski-SSD-1B/resolve/main/pytorch_lora_weights.safetensors + * https://huggingface.co/kylielee505/mylcmlorassd/resolve/main/pytorch_lora_weights.safetensors + +These files can be used out-of-the-box, unlike the models described in the next section. + + +## SD1.x, SD2.x with tiny U-Nets + +These models require conversion before use. You will need a Python script provided by the diffusers team, available on GitHub: + + * https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/scripts/convert_diffusers_to_original_stable_diffusion.py + +### SD2.x + +NotaAI provides the following model online: + +* https://huggingface.co/nota-ai/bk-sdm-v2-tiny + +Creating a .safetensors file involves two steps. First, run this short Python script to download the model from Hugging Face: + +```python +from diffusers import StableDiffusionPipeline +pipe = StableDiffusionPipeline.from_pretrained("nota-ai/bk-sdm-v2-tiny",cache_dir="./") +``` + +Second, create the .safetensors file by running: + +```bash +python convert_diffusers_to_original_stable_diffusion.py \ + --model_path models--nota-ai--bk-sdm-v2-tiny/snapshots/68277af553777858cd47e133f92e4db47321bc74 \ + --checkpoint_path bk-sdm-v2-tiny.safetensors --half --use_safetensors +``` + +This will generate the **file bk-sdm-v2-tiny.safetensors**, which is now ready for use with sd.cpp. + +### SD1.x + +Several Tiny SD 1.x models are available online, such as: + + * https://huggingface.co/segmind/tiny-sd + * https://huggingface.co/segmind/portrait-finetuned + * https://huggingface.co/nota-ai/bk-sdm-tiny + +These models also require conversion, partly because some tensors are stored in a non-contiguous manner. To create a usable checkpoint file, follow these simple steps: +Download and prepare the model using Python: + +##### Download the model using Python on your computer, for example this way: + +```python +import torch +from diffusers import StableDiffusionPipeline +pipe = StableDiffusionPipeline.from_pretrained("segmind/tiny-sd") +unet=pipe.unet +for param in unet.parameters(): + param.data = param.data.contiguous() # <- important here +pipe.save_pretrained("segmindtiny-sd", safe_serialization=True) +``` + +##### Run the conversion script: + +```bash +python convert_diffusers_to_original_stable_diffusion.py \ + --model_path ./segmindtiny-sd \ + --checkpoint_path ./segmind_tiny-sd.ckpt --half +``` + +The file segmind_tiny-sd.ckpt will be generated and is now ready for use with sd.cpp. You can follow a similar process for the other models mentioned above. + + +### Another available .ckpt file: + + * https://huggingface.co/ClashSAN/small-sd/resolve/main/tinySDdistilled.ckpt + +To use this file, you must first adjust its non-contiguous tensors: + +```python +import torch +ckpt = torch.load("tinySDdistilled.ckpt", map_location=torch.device('cpu')) +for key, value in ckpt['state_dict'].items(): + if isinstance(value, torch.Tensor): + ckpt['state_dict'][key] = value.contiguous() +torch.save(ckpt, "tinySDdistilled_fixed.ckpt") +``` diff --git a/docs/hipBLAS_on_Windows.md b/docs/hipBLAS_on_Windows.md index 48b85a0..eec08f3 100644 --- a/docs/hipBLAS_on_Windows.md +++ b/docs/hipBLAS_on_Windows.md @@ -47,8 +47,7 @@ set ninja=C:\Program Files\ninja\ninja.exe ## Building stable-diffusion.cpp -The thing different from the regular CPU build is `-DSD_HIPBLAS=ON` , -`-G "Ninja"`, `-DCMAKE_C_COMPILER=clang`, `-DCMAKE_CXX_COMPILER=clang++`, `-DAMDGPU_TARGETS=gfx1100` +The thing different from the regular CPU build is `-G "Ninja"`, `-DCMAKE_C_COMPILER=clang`, `-DCMAKE_CXX_COMPILER=clang++`, `-DSD_HIPBLAS=ON`, `-DGPU_TARGETS=gfx1100`, `-DAMDGPU_TARGETS=gfx1100`, `-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON`, `-DCMAKE_POSITION_INDEPENDENT_CODE=ON` Note: If you encounter an error such as the following: diff --git a/pyproject.toml b/pyproject.toml index 1b2275c..71dbeb4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,5 +1,5 @@ [build-system] -requires = ["scikit-build-core[pyproject]>=0.5.1"] +requires = ["scikit-build-core[pyproject]>=0.9.2"] build-backend = "scikit_build_core.build" [project] @@ -27,6 +27,15 @@ classifiers = [ "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", +] + +[project.optional-dependencies] +dev = [ + "black>=24.8.0", + "isort>=5.13.2", + "pytest>=7.4.4", + "ffmpeg-python>=0.2.0", ] [tool.scikit-build] @@ -34,7 +43,22 @@ wheel.packages = ["stable_diffusion_cpp"] cmake.verbose = true cmake.minimum-version = "3.21" minimum-version = "0.5.1" -sdist.include = [".git", "vendor/stable-diffusion.cpp/.git"] +sdist.include = [ + ".git/HEAD", + ".git/config", + ".git/refs/**", + "vendor/stable-diffusion.cpp/.git/HEAD", + "vendor/stable-diffusion.cpp/.git/config", + "vendor/stable-diffusion.cpp/.git/refs/**", +] +sdist.exclude = [ + "assets", + "vendor/stable-diffusion.cpp/assets", + ".git/objects/**", + "vendor/stable-diffusion.cpp/.git/objects/**", + ".git/logs/**", + "vendor/stable-diffusion.cpp/.git/logs/**", +] [tool.scikit-build.metadata.version] provider = "scikit_build_core.metadata.regex" @@ -50,6 +74,9 @@ line-length = 130 [tool.isort] profile = "black" -known_local_folder = ["stable_diffusion_cpp"] +known_local_folder = ["stable_diffusion_cpp", "tests"] remove_redundant_aliases = true length_sort = true + +[tool.pytest.ini_options] +testpaths = "tests" diff --git a/stable_diffusion_cpp/__init__.py b/stable_diffusion_cpp/__init__.py index 396315c..54fdf9a 100644 --- a/stable_diffusion_cpp/__init__.py +++ b/stable_diffusion_cpp/__init__.py @@ -4,4 +4,4 @@ # isort: on -__version__ = "0.2.7" +__version__ = "0.4.2" diff --git a/stable_diffusion_cpp/_internals.py b/stable_diffusion_cpp/_internals.py index 010d7ee..7fb1a13 100644 --- a/stable_diffusion_cpp/_internals.py +++ b/stable_diffusion_cpp/_internals.py @@ -1,13 +1,13 @@ import os +import ctypes from contextlib import ExitStack import stable_diffusion_cpp.stable_diffusion_cpp as sd_cpp - from ._utils import suppress_stdout_stderr -# ============================================ +# =========================================== # Stable Diffusion Model -# ============================================ +# =========================================== class _StableDiffusionModel: @@ -21,52 +21,84 @@ def __init__( model_path: str, clip_l_path: str, clip_g_path: str, + clip_vision_path: str, t5xxl_path: str, + llm_path: str, + llm_vision_path: str, diffusion_model_path: str, + high_noise_diffusion_model_path: str, vae_path: str, taesd_path: str, control_net_path: str, lora_model_dir: str, - embed_dir: str, - stacked_id_embed_dir: str, + embeddings: ctypes.Array[sd_cpp.sd_embedding_t], + embedding_count: int, + photo_maker_path: str, + tensor_type_rules: str, vae_decode_only: bool, - vae_tiling: bool, n_threads: int, wtype: int, rng_type: int, - schedule: int, + sampler_rng_type: int, + prediction: int, + lora_apply_mode: int, + offload_params_to_cpu: bool, keep_clip_on_cpu: bool, - keep_control_net_cpu: bool, + keep_control_net_on_cpu: bool, keep_vae_on_cpu: bool, diffusion_flash_attn: bool, + tae_preview_only: bool, + diffusion_conv_direct: bool, + vae_conv_direct: bool, + force_sdxl_vae_conv_scale: bool, + chroma_use_dit_mask: bool, + chroma_use_t5_mask: bool, + chroma_t5_mask_pad: int, + flow_shift: int, verbose: bool, ): - self.model_path = model_path - self.clip_l_path = clip_l_path - self.clip_g_path = clip_g_path - self.t5xxl_path = t5xxl_path - self.diffusion_model_path = diffusion_model_path - self.vae_path = vae_path - self.taesd_path = taesd_path - self.control_net_path = control_net_path - self.lora_model_dir = lora_model_dir - self.embed_dir = embed_dir - self.stacked_id_embed_dir = stacked_id_embed_dir - self.vae_decode_only = vae_decode_only - self.vae_tiling = vae_tiling - self.n_threads = n_threads - self.wtype = wtype - self.rng_type = rng_type - self.schedule = schedule - self.keep_clip_on_cpu = keep_clip_on_cpu - self.keep_control_net_cpu = keep_control_net_cpu - self.keep_vae_on_cpu = keep_vae_on_cpu - self.diffusion_flash_attn = diffusion_flash_attn - self.verbose = verbose - self._exit_stack = ExitStack() - self.model = None + self.params = sd_cpp.sd_ctx_params_t( + model_path=model_path.encode("utf-8"), + clip_l_path=clip_l_path.encode("utf-8"), + clip_g_path=clip_g_path.encode("utf-8"), + clip_vision_path=clip_vision_path.encode("utf-8"), + t5xxl_path=t5xxl_path.encode("utf-8"), + llm_path=llm_path.encode("utf-8"), + llm_vision_path=llm_vision_path.encode("utf-8"), + diffusion_model_path=diffusion_model_path.encode("utf-8"), + high_noise_diffusion_model_path=high_noise_diffusion_model_path.encode("utf-8"), + vae_path=vae_path.encode("utf-8"), + taesd_path=taesd_path.encode("utf-8"), + control_net_path=control_net_path.encode("utf-8"), + lora_model_dir=lora_model_dir.encode("utf-8"), + embeddings=embeddings, + embedding_count=embedding_count, + photo_maker_path=photo_maker_path.encode("utf-8"), + tensor_type_rules=tensor_type_rules.encode("utf-8"), + vae_decode_only=vae_decode_only, + free_params_immediately=False, # Don't unload model + n_threads=n_threads, + wtype=wtype, + rng_type=rng_type, + sampler_rng_type=sampler_rng_type, + prediction=prediction, + lora_apply_mode=lora_apply_mode, + offload_params_to_cpu=offload_params_to_cpu, + keep_clip_on_cpu=keep_clip_on_cpu, + keep_control_net_on_cpu=keep_control_net_on_cpu, + keep_vae_on_cpu=keep_vae_on_cpu, + diffusion_flash_attn=diffusion_flash_attn, + tae_preview_only=tae_preview_only, + diffusion_conv_direct=diffusion_conv_direct, + vae_conv_direct=vae_conv_direct, + force_sdxl_vae_conv_scale=force_sdxl_vae_conv_scale, + chroma_use_dit_mask=chroma_use_dit_mask, + chroma_use_t5_mask=chroma_use_t5_mask, + chroma_t5_mask_pad=chroma_t5_mask_pad, + flow_shift=flow_shift, + ) # Load the free_sd_ctx function self._free_sd_ctx = sd_cpp._lib.free_sd_ctx @@ -74,43 +106,20 @@ def __init__( # Load the model from the file if the path is provided if model_path: if not os.path.exists(model_path): - raise ValueError(f"Model path does not exist: {model_path}") + raise ValueError(f"Model path does not exist: '{model_path}'") if diffusion_model_path: if not os.path.exists(diffusion_model_path): - raise ValueError(f"Diffusion model path does not exist: {diffusion_model_path}") + raise ValueError(f"Diffusion model path does not exist: '{diffusion_model_path}'") if model_path or diffusion_model_path: with suppress_stdout_stderr(disable=verbose): - # Load the Stable Diffusion model ctx - self.model = sd_cpp.new_sd_ctx( - self.model_path.encode("utf-8"), - self.clip_l_path.encode("utf-8"), - self.clip_g_path.encode("utf-8"), - self.t5xxl_path.encode("utf-8"), - self.diffusion_model_path.encode("utf-8"), - self.vae_path.encode("utf-8"), - self.taesd_path.encode("utf-8"), - self.control_net_path.encode("utf-8"), - self.lora_model_dir.encode("utf-8"), - self.embed_dir.encode("utf-8"), - self.stacked_id_embed_dir.encode("utf-8"), - self.vae_decode_only, - self.vae_tiling, - False, # Free params immediately (unload model) - self.n_threads, - self.wtype, - self.rng_type, - self.schedule, - self.keep_clip_on_cpu, - self.keep_control_net_cpu, - self.diffusion_flash_attn, - self.keep_vae_on_cpu, - ) + # Call function with a pointer to params + self.model = sd_cpp.new_sd_ctx(ctypes.pointer(self.params)) # Check if the model was loaded successfully if self.model is None: - raise ValueError(f"Failed to load model from file: {model_path}") + raise ValueError(f"Failed to load model from file: '{model_path}'") def free_ctx(): """Free the model from memory.""" @@ -129,9 +138,9 @@ def __del__(self): self.close() -# ============================================ +# =========================================== # Upscaler Model -# ============================================ +# =========================================== class _UpscalerModel: @@ -143,11 +152,17 @@ class _UpscalerModel: def __init__( self, upscaler_path: str, + offload_params_to_cpu: bool, + direct: bool, n_threads: int, + tile_size: int, verbose: bool, ): self.upscaler_path = upscaler_path + self.offload_params_to_cpu = offload_params_to_cpu + self.direct = direct self.n_threads = n_threads + self.tile_size = tile_size self.verbose = verbose self._exit_stack = ExitStack() @@ -160,14 +175,20 @@ def __init__( self._free_upscaler_ctx = sd_cpp._lib.free_upscaler_ctx if not os.path.exists(upscaler_path): - raise ValueError(f"Upscaler model path does not exist: {upscaler_path}") + raise ValueError(f"Upscaler model path does not exist: '{upscaler_path}'") # Load the image upscaling model ctx - self.upscaler = sd_cpp.new_upscaler_ctx(upscaler_path.encode("utf-8"), self.n_threads) + self.upscaler = sd_cpp.new_upscaler_ctx( + upscaler_path.encode("utf-8"), + self.offload_params_to_cpu, + self.direct, + self.n_threads, + self.tile_size, + ) # Check if the model was loaded successfully if self.upscaler is None: - raise ValueError(f"Failed to load upscaler model from file: {upscaler_path}") + raise ValueError(f"Failed to load upscaler model from file: '{upscaler_path}'") def free_ctx(): """Free the model from memory.""" diff --git a/stable_diffusion_cpp/_logger.py b/stable_diffusion_cpp/_logger.py index 5182b4b..cb84422 100644 --- a/stable_diffusion_cpp/_logger.py +++ b/stable_diffusion_cpp/_logger.py @@ -27,7 +27,7 @@ def sd_log_callback( data: ctypes.c_void_p, ): if logger.level <= SD_LOG_LEVEL_TO_LOGGING_LEVEL[level]: - print(text.decode("utf-8"), end="", flush=True, file=sys.stderr) + print(text.decode("utf-8", errors="replace"), end="", flush=True, file=sys.stderr) stable_diffusion_cpp.sd_set_log_callback(sd_log_callback, ctypes.c_void_p(0)) diff --git a/stable_diffusion_cpp/stable_diffusion.py b/stable_diffusion_cpp/stable_diffusion.py index cae701f..f4a4425 100644 --- a/stable_diffusion_cpp/stable_diffusion.py +++ b/stable_diffusion_cpp/stable_diffusion.py @@ -1,8 +1,12 @@ +import os +import re import ctypes import random import contextlib import multiprocessing +from ctypes import c_uint32 from typing import Dict, List, Union, Callable, Optional +from pathlib import Path from PIL import Image @@ -10,7 +14,15 @@ from ._utils import suppress_stdout_stderr from ._logger import log_event, set_verbose from ._internals import _UpscalerModel, _StableDiffusionModel -from stable_diffusion_cpp import RNGType, GGMLType, Schedule, SampleMethod +from stable_diffusion_cpp import ( + Preview, + RNGType, + GGMLType, + Scheduler, + Prediction, + SampleMethod, + LoraApplyMode, +) class StableDiffusion: @@ -21,28 +33,45 @@ def __init__( model_path: str = "", clip_l_path: str = "", clip_g_path: str = "", + clip_vision_path: str = "", t5xxl_path: str = "", + llm_path: str = "", + llm_vision_path: str = "", diffusion_model_path: str = "", + high_noise_diffusion_model_path: str = "", vae_path: str = "", taesd_path: str = "", control_net_path: str = "", upscaler_path: str = "", + upscale_tile_size: int = 128, lora_model_dir: str = "", - embed_dir: str = "", - stacked_id_embed_dir: str = "", + embedding_paths: List[str] = [], + photo_maker_path: str = "", + tensor_type_rules: str = "", vae_decode_only: bool = False, - vae_tiling: bool = False, n_threads: int = -1, - wtype: Optional[Union[str, GGMLType, int, float]] = "default", - rng_type: Optional[Union[str, RNGType, int, float]] = "cuda", - schedule: Optional[Union[str, Schedule, int, float]] = "default", + wtype: Union[str, GGMLType, int, float] = "default", + rng_type: Union[str, RNGType, int, float] = "cuda", + sampler_rng_type: Union[str, RNGType, int, float] = "cuda", + prediction: Union[str, Prediction, int, float] = "default", + lora_apply_mode: Union[str, LoraApplyMode, int, float] = "auto", + offload_params_to_cpu: bool = False, keep_clip_on_cpu: bool = False, - keep_control_net_cpu: bool = False, + keep_control_net_on_cpu: bool = False, keep_vae_on_cpu: bool = False, diffusion_flash_attn: bool = False, + tae_preview_only: bool = False, + diffusion_conv_direct: bool = False, + vae_conv_direct: bool = False, + force_sdxl_vae_conv_scale: bool = False, + chroma_use_dit_mask: bool = True, + chroma_use_t5_mask: bool = False, + chroma_t5_mask_pad: int = 1, + flow_shift: float = float("inf"), + image_resize_method: str = "crop", verbose: bool = True, ): - """Load a stable-diffusion.cpp model from `model_path`. + """Load a stable-diffusion.cpp model from `model_path` or `diffusion_model_path`. Examples: Basic usage @@ -51,117 +80,209 @@ def __init__( >>> model = stable_diffusion_cpp.StableDiffusion( ... model_path="path/to/model", ... ) - >>> images = stable_diffusion.txt_to_img(prompt="a lovely cat") + >>> images = stable_diffusion.generate_image(prompt="a lovely cat") >>> images[0].save("output.png") Args: - model_path: Path to the model. - clip_l_path: Path to the clip_l. - t5xxl_path: Path to the t5xxl. - diffusion_model_path: Path to the diffusion model. - vae_path: Path to the vae. - taesd_path: Path to the taesd. - control_net_path: Path to the control net. - upscaler_path: Path to esrgan model (Upscale images after generation). + model_path: Path to the full model. + clip_l_path: Path to the clip-l text encoder. + clip_g_path: Path to the clip-g text encoder. + clip_vision_path: Path to the clip-vision encoder. + t5xxl_path: Path to the t5xxl text encoder. + llm_path: Path to the llm text encoder (example: qwenvl2.5 for qwen-image, mistral-small3.2 for flux2). + llm_vision_path: Path to the llm vit. + diffusion_model_path: Path to the standalone diffusion model. + high_noise_diffusion_model_path: Path to the standalone high noise diffusion model. + vae_path: Path to the standalone vae model. + taesd_path: Path to the taesd. Using Tiny AutoEncoder for fast decoding (low quality). + control_net_path: Path to the Control Net model. + upscaler_path: Path to ESRGAN model (upscale images separately or after generation). + upscale_tile_size: Tile size for upscaler model. lora_model_dir: Lora model directory. - embed_dir: Path to embeddings. - stacked_id_embed_dir: Path to PHOTOMAKER stacked id embeddings. + embedding_paths: List of paths to embedding files. + photo_maker_path: Path to PhotoMaker model. + tensor_type_rules: Weight type per tensor pattern (example: "^vae\\.=f16,model\\.=q8_0") vae_decode_only: Process vae in decode only mode. - vae_tiling: Process vae in tiles to reduce memory usage. n_threads: Number of threads to use for generation (default: half the number of CPUs). wtype: The weight type (default: automatically determines the weight type of the model file). rng_type: Random number generator. - schedule: Denoiser sigma schedule. + sampler_rng_type: Random number generator for sampler. + prediction: Prediction type override. + lora_apply_mode: The way to apply LoRA, (default: "auto"). In auto mode, if the model weights contain any quantized parameters, the "at_runtime" mode will be used; otherwise, "immediately" will be used. The "immediately" mode may have precision and compatibility issues with quantized parameters, but it usually offers faster inference speed and, in some cases, lower memory usage. The "at_runtime" mode, on the other hand, is exactly the opposite. + offload_params_to_cpu: Place the weights in RAM to save VRAM, and automatically load them into VRAM when needed. keep_clip_on_cpu: Keep clip in CPU (for low vram). - keep_control_net_cpu: Keep controlnet in CPU (for low vram). + keep_control_net_on_cpu: Keep Control Net in CPU (for low vram). keep_vae_on_cpu: Keep vae in CPU (for low vram). - diffusion_flash_attn: Use flash attention in diffusion model (can reduce memory usage significantly). - verbose: Print verbose output to stderr. + diffusion_flash_attn: Use flash attention in diffusion model (can reduce memory usage significantly). May lower quality or crash if backend not supported. + tae_preview_only: Prevents usage of taesd for decoding the final image (for use with preview="tae"). + diffusion_conv_direct: Use Conv2d direct in the diffusion model. May crash if backend not supported. + vae_conv_direct: Use Conv2d direct in the vae model (should improve performance). May crash if backend not supported. + force_sdxl_vae_conv_scale: Force use of conv scale on SDXL vae. + chroma_use_dit_mask: Use DiT mask for Chroma. + chroma_use_t5_mask: Use T5 mask for Chroma. + chroma_t5_mask_pad: T5 mask padding size of Chroma. + flow_shift: Shift value for Flow models like SD3.x or WAN (default: auto). + image_resize_method: Method to resize images for init, mask, control and reference images ("crop" or "resize"). + verbose: Print verbose output. Raises: - ValueError: If a model path does not exist. + ValueError: If arguments are invalid or mutually incompatible. + RuntimeError: If the model is not loaded when required. + NotImplementedError: If a feature is not implemented. Returns: A Stable Diffusion instance. """ # Params - self.model_path = model_path - self.clip_l_path = clip_l_path - self.clip_g_path = clip_g_path - self.t5xxl_path = t5xxl_path - self.diffusion_model_path = diffusion_model_path - self.vae_path = vae_path - self.taesd_path = taesd_path - self.control_net_path = control_net_path - self.lora_model_dir = lora_model_dir - self.embed_dir = embed_dir - self.stacked_id_embed_dir = stacked_id_embed_dir + self.model_path = self._clean_path(model_path) + self.clip_l_path = self._clean_path(clip_l_path) + self.clip_g_path = self._clean_path(clip_g_path) + self.clip_vision_path = self._clean_path(clip_vision_path) + self.t5xxl_path = self._clean_path(t5xxl_path) + self.llm_path = self._clean_path(llm_path) + self.llm_vision_path = self._clean_path(llm_vision_path) + self.diffusion_model_path = self._clean_path(diffusion_model_path) + self.high_noise_diffusion_model_path = self._clean_path(high_noise_diffusion_model_path) + self.vae_path = self._clean_path(vae_path) + self.taesd_path = self._clean_path(taesd_path) + self.control_net_path = self._clean_path(control_net_path) + self.upscaler_path = self._clean_path(upscaler_path) + self.upscale_tile_size = upscale_tile_size + self.lora_model_dir = self._clean_path(lora_model_dir) + self.embedding_paths = [self._clean_path(p) for p in embedding_paths] + self.photo_maker_path = self._clean_path(photo_maker_path) + self.tensor_type_rules = tensor_type_rules self.vae_decode_only = vae_decode_only - self.vae_tiling = vae_tiling self.n_threads = n_threads self.wtype = wtype self.rng_type = rng_type - self.schedule = schedule + self.sampler_rng_type = sampler_rng_type + self.prediction = prediction + self.lora_apply_mode = lora_apply_mode + self.offload_params_to_cpu = offload_params_to_cpu self.keep_clip_on_cpu = keep_clip_on_cpu - self.keep_control_net_cpu = keep_control_net_cpu + self.keep_control_net_on_cpu = keep_control_net_on_cpu self.keep_vae_on_cpu = keep_vae_on_cpu self.diffusion_flash_attn = diffusion_flash_attn + self.tae_preview_only = tae_preview_only + self.diffusion_conv_direct = diffusion_conv_direct + self.vae_conv_direct = vae_conv_direct + self.force_sdxl_vae_conv_scale = force_sdxl_vae_conv_scale + self.chroma_use_dit_mask = chroma_use_dit_mask + self.chroma_use_t5_mask = chroma_use_t5_mask + self.chroma_t5_mask_pad = chroma_t5_mask_pad + self.flow_shift = flow_shift + self.image_resize_method = image_resize_method self._stack = contextlib.ExitStack() # Default to half the number of CPUs if n_threads <= 0: self.n_threads = max(multiprocessing.cpu_count() // 2, 1) - # =========== Logging =========== + # ------------------------------------------- + # Logging + # ------------------------------------------- self.verbose = verbose set_verbose(verbose) - # =========== Validate Inputs =========== + # ------------------------------------------- + # Validate Inputs + # ------------------------------------------- + + self.wtype = self._validate_and_set_input(self.wtype, GGML_TYPE_MAP, "wtype") + self.rng_type = self._validate_and_set_input(self.rng_type, RNG_TYPE_MAP, "rng_type") + self.sampler_rng_type = self._validate_and_set_input(self.sampler_rng_type, RNG_TYPE_MAP, "sampler_rng_type") + self.lora_apply_mode = self._validate_and_set_input(self.lora_apply_mode, LORA_APPLY_MODE_MAP, "lora_apply_mode") + self.prediction = self._validate_and_set_input(self.prediction, PREDICTION_MAP, "prediction") + + # ------------------------------------------- + # Embeddings + # ------------------------------------------- + + _embedding_items = [] + for p in self.embedding_paths: + path = Path(p) + if not path.is_file(): + raise ValueError(f"Embedding not found: {p}") + + _embedding_items.append( + sd_cpp.sd_embedding_t( + name=path.stem.encode("utf-8"), # Filename minus extension + path=str(path).encode("utf-8"), + ) + ) - self.wtype = validate_and_set_input(self.wtype, GGML_TYPE_MAP, "wtype") - self.rng_type = validate_and_set_input(self.rng_type, RNG_TYPE_MAP, "rng_type") - self.schedule = validate_and_set_input(self.schedule, SCHEDULE_MAP, "schedule") + if _embedding_items: + EmbeddingArrayType = sd_cpp.sd_embedding_t * len(self._embedding_items) + _embedding_array = EmbeddingArrayType(*self._embedding_items) + _embedding_count = c_uint32(len(self._embedding_items)) + else: + _embedding_array = None + _embedding_count = c_uint32(0) - # =========== SD Model loading =========== + # ------------------------------------------- + # SD Model Loading + # ------------------------------------------- self._model = self._stack.enter_context( contextlib.closing( _StableDiffusionModel( - self.model_path, - self.clip_l_path, - self.clip_g_path, - self.t5xxl_path, - self.diffusion_model_path, - self.vae_path, - self.taesd_path, - self.control_net_path, - self.lora_model_dir, - self.embed_dir, - self.stacked_id_embed_dir, - self.vae_decode_only, - self.vae_tiling, - self.n_threads, - self.wtype, - self.rng_type, - self.schedule, - self.keep_clip_on_cpu, - self.keep_control_net_cpu, - self.keep_vae_on_cpu, - self.diffusion_flash_attn, - self.verbose, + model_path=self.model_path, + clip_l_path=self.clip_l_path, + clip_g_path=self.clip_g_path, + clip_vision_path=self.clip_vision_path, + t5xxl_path=self.t5xxl_path, + llm_path=self.llm_path, + llm_vision_path=self.llm_vision_path, + diffusion_model_path=self.diffusion_model_path, + high_noise_diffusion_model_path=self.high_noise_diffusion_model_path, + vae_path=self.vae_path, + taesd_path=self.taesd_path, + control_net_path=self.control_net_path, + lora_model_dir=self.lora_model_dir, + embeddings=_embedding_array, + embedding_count=_embedding_count, + photo_maker_path=self.photo_maker_path, + tensor_type_rules=self.tensor_type_rules, + vae_decode_only=self.vae_decode_only, + n_threads=self.n_threads, + wtype=self.wtype, + rng_type=self.rng_type, + sampler_rng_type=self.sampler_rng_type, + prediction=self.prediction, + lora_apply_mode=self.lora_apply_mode, + offload_params_to_cpu=self.offload_params_to_cpu, + keep_clip_on_cpu=self.keep_clip_on_cpu, + keep_control_net_on_cpu=self.keep_control_net_on_cpu, + keep_vae_on_cpu=self.keep_vae_on_cpu, + diffusion_flash_attn=self.diffusion_flash_attn, + tae_preview_only=self.tae_preview_only, + diffusion_conv_direct=self.diffusion_conv_direct, + vae_conv_direct=self.vae_conv_direct, + force_sdxl_vae_conv_scale=self.force_sdxl_vae_conv_scale, + chroma_use_dit_mask=self.chroma_use_dit_mask, + chroma_use_t5_mask=self.chroma_use_t5_mask, + chroma_t5_mask_pad=self.chroma_t5_mask_pad, + flow_shift=self.flow_shift, + verbose=self.verbose, ) ) ) - # =========== Upscaling Model loading =========== + # ------------------------------------------- + # Upscaler Model Loading + # ------------------------------------------- self._upscaler = self._stack.enter_context( contextlib.closing( _UpscalerModel( - upscaler_path, - self.n_threads, - self.verbose, + upscaler_path=upscaler_path, + offload_params_to_cpu=self.offload_params_to_cpu, + direct=self.diffusion_conv_direct, # Use diffusion_conv_direct + n_threads=self.n_threads, + tile_size=self.upscale_tile_size, + verbose=self.verbose, ) ) ) @@ -173,89 +294,159 @@ def model(self) -> sd_cpp.sd_ctx_t_p: @property def upscaler(self) -> sd_cpp.upscaler_ctx_t_p: - assert self._upscaler.upscaler is not None + if self._upscaler is None or self._upscaler.upscaler is None: + raise RuntimeError("Upscaler not initialized, did you pass `upscaler_path`") return self._upscaler.upscaler - # ============================================ - # Text to Image - # ============================================ + # =========================================== + # Generate Image + # =========================================== - def txt_to_img( + def generate_image( self, prompt: str, negative_prompt: str = "", clip_skip: int = -1, - cfg_scale: float = 7.0, - guidance: float = 3.5, - eta: float = 0.0, + init_image: Optional[Union[Image.Image, str]] = None, + ref_images: Optional[List[Union[Image.Image, str]]] = None, + auto_resize_ref_image: bool = True, + increase_ref_index: bool = False, + mask_image: Optional[Union[Image.Image, str]] = None, width: int = 512, height: int = 512, - sample_method: Optional[Union[str, SampleMethod, int, float]] = "euler_a", + # --- + # guidance_params + cfg_scale: float = 7.0, + image_cfg_scale: Optional[float] = None, + guidance: float = 3.5, + # sample_params + scheduler: Union[str, Scheduler, int, float, None] = "default", + sample_method: Union[str, SampleMethod, int, float, None] = "default", sample_steps: int = 20, - seed: int = 42, - batch_count: int = 1, - control_cond: Optional[Union[Image.Image, str]] = None, - control_strength: float = 0.9, - style_strength: float = 20.0, - normalize_input: bool = False, - input_id_images_path: str = "", + eta: float = 0.0, + timestep_shift: int = 0, + sigmas: Optional[str] = None, + # slg_params skip_layers: List[int] = [7, 8, 9], - slg_scale: float = 0.0, skip_layer_start: float = 0.01, skip_layer_end: float = 0.2, + slg_scale: float = 0.0, + # --- + strength: float = 0.75, + seed: int = 42, + batch_count: int = 1, + control_image: Optional[Union[Image.Image, str]] = None, + control_strength: float = 0.9, + pm_id_embed_path: str = "", + pm_id_images: Optional[List[Union[Image.Image, str]]] = None, + pm_style_strength: float = 20.0, + vae_tiling: bool = False, + vae_tile_overlap: float = 0.5, + vae_tile_size: Optional[Union[int, str]] = "0x0", + vae_relative_tile_size: Optional[Union[float, str]] = "0x0", + easycache: bool = False, + easycache_options: str = "0.2,0.15,0.95", canny: bool = False, upscale_factor: int = 1, + preview_method: Union[str, Preview, int, float] = "none", + preview_noisy: bool = False, + preview_interval: int = 1, + preview_callback: Optional[Callable] = None, progress_callback: Optional[Callable] = None, ) -> List[Image.Image]: - """Generate images from a text prompt. + """Generate images from a text prompt and or input images. Args: prompt: The prompt to render. negative_prompt: The negative prompt. - clip_skip: Ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer. + clip_skip: Ignore last layers of CLIP network (1 ignores none, 2 ignores one layer, <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x). + init_image: An input image path or Pillow Image to direct the generation. + ref_images: A list of input image paths or Pillow Images for Flux Kontext models (can be used multiple times). + auto_resize_ref_image: Automatically resize reference images. + increase_ref_index: Automatically increase the indices of reference images based on the order they are listed (starting with 1). + mask_image: The inpainting mask image path or Pillow Image. + width: Image width, in pixel space. + height: Image height, in pixel space. cfg_scale: Unconditional guidance scale. - guidance: Guidance scale. - eta: Eta in DDIM, only for DDIM and TCD. - width: Image height, in pixel space. - height: Image width, in pixel space. - sample_method: Sampling method. + image_cfg_scale: Image guidance scale for inpaint or instruct-pix2pix models. + guidance: Distilled guidance scale for models with guidance input. + scheduler: Denoiser sigma scheduler (default: discrete). + sample_method: Sampling method (default: euler for Flux/SD3/Wan, euler_a otherwise). sample_steps: Number of sample steps. - seed: RNG seed (default: 42, use random seed for < 0). + eta: Eta in DDIM, only for DDIM and TCD. + timestep_shift: Shift timestep for NitroFusion models, default: 0, recommended N for NitroSD-Realism around 250 and 500 for NitroSD-Vibrant. + sigmas: Custom sigma values for the sampler, comma-separated (e.g. "14.61,7.8,3.5,0.0"). + skip_layers: Layers to skip for SLG steps (SLG will be enabled at step int([STEPS]x[START]) and disabled at int([STEPS]x[END])). + skip_layer_start: SLG enabling point. + skip_layer_end: SLG disabling point. + slg_scale: Skip layer guidance (SLG) scale, only for DiT models. + strength: Strength for noising/unnoising. + seed: RNG seed (uses random seed for < 0). batch_count: Number of images to generate. - control_cond: A control condition image path or Pillow Image. + control_image: A control condition image path or Pillow Image (Control Net). control_strength: Strength to apply Control Net. - style_strength: Strength for keeping input identity (default: 20%). - normalize_input: Normalize PHOTOMAKER input id images. - input_id_images_path: Path to PHOTOMAKER input id images dir. - skip_layers: Layers to skip for SLG steps (default: [7,8,9]). - slg_scale: Skip layer guidance (SLG) scale, only for DiT models (default: 0). - skip_layer_start: SLG enabling point (default: 0.01). - skip_layer_end: SLG disabling point (default: 0.2). - canny: Apply canny edge detection preprocessor to the control_cond image. - upscale_factor: The image upscaling factor. + pm_id_embed_path: Path to PhotoMaker v2 id embed. + pm_id_images: A list of input image paths or Pillow Images for PhotoMaker input identity. + pm_style_strength: Strength for keeping PhotoMaker input identity. + vae_tiling: Process vae in tiles to reduce memory usage. + vae_tile_overlap: Tile overlap for vae tiling, in fraction of tile size. + vae_tile_size: Tile size for vae tiling ([X]x[Y] format). + vae_relative_tile_size: Relative tile size for vae tiling, in fraction of image size if < 1, in number of tiles per dim if >=1 ([X]x[Y] format) (overrides `vae_tile_size`). + easycache: Enable EasyCache for DiT models. + easycache_options: EasyCache options for DiT models with format "threshold,start_percent,end_percent". + canny: Apply canny edge detection preprocessor to the `control_image`. + upscale_factor: Run the ESRGAN upscaler this many times. + preview_method: The preview method to use (default: none). + preview_noisy: Enables previewing noisy inputs of the models rather than the denoised outputs. + preview_interval: Interval in denoising steps between consecutive updates of the image preview (default: 1, meaning update at every step) + preview_callback: Callback function to call on each preview frame. progress_callback: Callback function to call on each step end. Returns: - A list of Pillow Images.""" + A list of Pillow Images. + """ if self.model is None: - raise Exception("Stable diffusion model not loaded.") + raise RuntimeError("Stable Diffusion model not loaded") + + if self.vae_decode_only == True and (init_image or ref_images): + raise ValueError("`vae_decode_only` cannot be True when an `init_image` or `ref_images` are provided") - # =========== Validate string and int inputs =========== + # ------------------------------------------- + # Validation + # ------------------------------------------- - sample_method = validate_and_set_input(sample_method, SAMPLE_METHOD_MAP, "sample_method") + width = self._validate_dimensions(width, "width") + height = self._validate_dimensions(height, "height") - # Ensure dimensions are multiples of 64 - width = validate_dimensions(width, "width") - height = validate_dimensions(height, "height") + if batch_count < 1: + raise ValueError("`batch_count` must be at least 1") + if upscale_factor < 1: + raise ValueError("`upscale_factor` must at least 1") + if sample_steps < 1: + raise ValueError("`sample_steps` must be at least 1") + if strength < 0.0 or strength > 1.0: + raise ValueError("`strength` must be in the range [0.0, 1.0]") + if timestep_shift < 0 or timestep_shift > 1000: + raise ValueError("`timestep_shift` must be in the range [0, 1000]") - # =========== Set seed =========== + # ------------------------------------------- + # Set CFG Scale + # ------------------------------------------- + + image_cfg_scale = cfg_scale if image_cfg_scale is None else image_cfg_scale + + # ------------------------------------------- + # Set Seed + # ------------------------------------------- # Set a random seed if seed is negative if seed < 0: seed = random.randint(0, 10000) - # ==================== Set the callback function ==================== + # ------------------------------------------- + # Set the Progress Callback Function + # ------------------------------------------- if progress_callback is not None: @@ -270,135 +461,353 @@ def sd_progress_callback( sd_cpp.sd_set_progress_callback(sd_progress_callback, ctypes.c_void_p(0)) - # ==================== Format Inputs ==================== + # ------------------------------------------- + # Set the Preview Callback Function + # ------------------------------------------- - # Convert the control condition to a C sd_image_t - control_cond = self._format_control_cond(control_cond, canny, self.control_net_path) + preview_method = self._validate_and_set_input(preview_method, PREVIEW_MAP, "preview_method") - # Convert skip_layers to a ctypes array - skip_layers_array = (ctypes.c_int * len(skip_layers))(*skip_layers) - skip_layers_count = len(skip_layers) + if preview_callback is not None: + + @sd_cpp.sd_preview_callback + def sd_preview_callback( + step: int, + frame_count: int, + frames: sd_cpp.sd_image_t, + is_noisy: ctypes.c_bool, + data: ctypes.c_void_p, + ): + pil_frames = self._sd_image_t_p_to_images(frames, frame_count, 1) + preview_callback(step, pil_frames, is_noisy) + + sd_cpp.sd_set_preview_callback( + sd_preview_callback, + preview_method, + preview_interval, + not preview_noisy, + preview_noisy, + ctypes.c_void_p(0), + ) + + # ------------------------------------------- + # Extract Loras + # ------------------------------------------- + + _prompt_without_loras, _lora_array, _lora_count, _lora_string_buffers = self._extract_and_build_loras( + prompt, + self.lora_model_dir, + ) + + # ------------------------------------------- + # Reference Images + # ------------------------------------------- + + _ref_images_pointer, ref_images_count = self._create_image_array( + ref_images, resize=False + ) # Disable resize, sd.cpp handles it + _id_images_pointer, id_images_count = self._create_image_array(pm_id_images) + + # ------------------------------------------- + # Vae Tiling + # ------------------------------------------- + + tile_size_x, tile_size_y = self._parse_tile_size(vae_tile_size, as_float=False) + rel_size_x, rel_size_y = self._parse_tile_size(vae_relative_tile_size, as_float=True) + + # ------------------------------------------- + # Scheduler/Sample Method + # ------------------------------------------- + + scheduler = self._validate_and_set_input(scheduler, SCHEDULER_MAP, "scheduler", allow_none=True) + if scheduler is None: + scheduler = sd_cpp.sd_get_default_scheduler(self.model) + + sample_method = self._validate_and_set_input(sample_method, SAMPLE_METHOD_MAP, "sample_method", allow_none=True) + if sample_method is None: + sample_method = sd_cpp.sd_get_default_sample_method(self.model) + + # ------------------------------------------- + # Sigmas + # ------------------------------------------- + + _custom_sigmas = self._parse_sigmas(sigmas) + _custom_sigmas_count = len(_custom_sigmas) + + SigmasArrayType = ctypes.c_float * _custom_sigmas_count + _custom_sigmas = ctypes.cast(SigmasArrayType(*_custom_sigmas), ctypes.POINTER(ctypes.c_float)) + + # ------------------------------------------- + # Parameters + # ------------------------------------------- + + _easycache_params = sd_cpp.sd_easycache_params_t( + **self._parse_easycache( + enabled=easycache, + option_value=easycache_options, + ) + ) + + _pm_params = sd_cpp.sd_pm_params_t( + id_images=_id_images_pointer, + id_images_count=id_images_count, + id_embed_path=pm_id_embed_path.encode("utf-8"), + style_strength=pm_style_strength, + ) + + _vae_tiling_params = sd_cpp.sd_tiling_params_t( + enabled=vae_tiling, + tile_size_x=tile_size_x, + tile_size_y=tile_size_y, + target_overlap=vae_tile_overlap, + rel_size_x=rel_size_x, + rel_size_y=rel_size_y, + ) + + _guidance_params = sd_cpp.sd_guidance_params_t( + txt_cfg=cfg_scale, + img_cfg=image_cfg_scale, + distilled_guidance=guidance, + slg=sd_cpp.sd_slg_params_t( + layers=(ctypes.c_int * len(skip_layers))(*skip_layers), # Convert to ctypes array + layer_count=len(skip_layers), + layer_start=skip_layer_start, + layer_end=skip_layer_end, + scale=slg_scale, + ), + ) + + _sample_params = sd_cpp.sd_sample_params_t( + guidance=_guidance_params, + scheduler=scheduler, + sample_method=sample_method, + sample_steps=sample_steps, + eta=eta, + shifted_timestep=timestep_shift, + custom_sigmas=_custom_sigmas, + custom_sigmas_count=_custom_sigmas_count, + ) + + _params = sd_cpp.sd_img_gen_params_t( + loras=_lora_array, + lora_count=_lora_count, + prompt=_prompt_without_loras.encode("utf-8"), + negative_prompt=negative_prompt.encode("utf-8"), + clip_skip=clip_skip, + init_image=self._format_init_image(init_image, width, height), + ref_images=_ref_images_pointer, + auto_resize_ref_image=auto_resize_ref_image, + ref_images_count=ref_images_count, + increase_ref_index=increase_ref_index, + mask_image=self._format_mask_image(mask_image, width, height), + width=width, + height=height, + sample_params=_sample_params, + strength=strength, + seed=seed, + batch_count=batch_count, + control_image=self._format_control_image(control_image, canny, width, height), + control_strength=control_strength, + pm_params=_pm_params, + vae_tiling_params=_vae_tiling_params, + easycache=_easycache_params, + ) + + # Log system info + log_event(level=2, message=sd_cpp.sd_get_system_info().decode("utf-8")) with suppress_stdout_stderr(disable=self.verbose): # Generate images - c_images = sd_cpp.txt2img( + _c_images = sd_cpp.generate_image( self.model, - prompt.encode("utf-8"), - negative_prompt.encode("utf-8"), - clip_skip, - cfg_scale, - guidance, - eta, - width, - height, - sample_method, - sample_steps, - seed, - batch_count, - control_cond, - control_strength, - style_strength, - normalize_input, - input_id_images_path.encode("utf-8"), - skip_layers_array, - skip_layers_count, - slg_scale, - skip_layer_start, - skip_layer_end, + ctypes.byref(_params), ) - # Convert the C array of images to a Python list of images - return self._sd_image_t_p_to_images(c_images, batch_count, upscale_factor) + # Convert C array to Python list of images + images = self._sd_image_t_p_to_images(_c_images, batch_count, upscale_factor) + + # ------------------------------------------- + # Attach Image Metadata + # ------------------------------------------- + + func_args = locals() + gen_args = { + k: v + for k, v in func_args.items() + if k + not in { + "self", + "images", + "progress_callback", + "sd_progress_callback", + "preview_callback", + "sd_preview_callback", + } + and not k.startswith("_") # Skip internals + } + model_args = {k: v for k, v in self.__dict__.items() if not k.startswith("_")} # Skip internals - # ============================================ - # Image to Image - # ============================================ + for i, image in enumerate(images): + image.info.update({**model_args, **gen_args, "seed": seed + i if batch_count > 1 else seed}) - def img_to_img( + return images + + # =========================================== + # Generate Video + # =========================================== + + def generate_video( self, - image: Union[Image.Image, str], - prompt: str, - mask_image: Optional[Union[Image.Image, str]] = None, + prompt: str = "", negative_prompt: str = "", clip_skip: int = -1, - cfg_scale: float = 7.0, - guidance: float = 3.5, - eta: float = 0.0, + init_image: Optional[Union[Image.Image, str]] = None, + end_image: Optional[Union[Image.Image, str]] = None, + control_frames: Optional[List[Union[Image.Image, str]]] = None, width: int = 512, height: int = 512, - sample_method: Optional[Union[str, SampleMethod, int, float]] = "euler_a", + # --- + # guidance_params + cfg_scale: float = 7.0, + image_cfg_scale: Optional[float] = None, + guidance: float = 3.5, + # sample_params + scheduler: Union[str, Scheduler, int, float, None] = "default", + sample_method: Optional[Union[str, SampleMethod, int, float, None]] = "default", sample_steps: int = 20, - strength: float = 0.75, - seed: int = 42, - batch_count: int = 1, - control_cond: Optional[Union[Image.Image, str]] = None, - control_strength: float = 0.9, - style_strength: float = 20.0, - normalize_input: bool = False, - input_id_images_path: str = "", + eta: float = 0.0, + timestep_shift: int = 0, + sigmas: Optional[str] = None, + # slg_params skip_layers: List[int] = [7, 8, 9], - slg_scale: float = 0.0, skip_layer_start: float = 0.01, skip_layer_end: float = 0.2, - canny: bool = False, + slg_scale: float = 0.0, + # --- + # high_noise_guidance_params + high_noise_cfg_scale: float = 7.0, + high_noise_image_cfg_scale: Optional[float] = None, + high_noise_guidance: float = 3.5, + # high_noise_sample_params + high_noise_scheduler: Union[str, Scheduler, int, float, None] = "default", + high_noise_sample_method: Union[str, SampleMethod, int, float, None] = "default", + high_noise_sample_steps: int = -1, + high_noise_eta: float = 0.0, + # high_noise_slg_params + high_noise_skip_layers: List[int] = [7, 8, 9], + high_noise_skip_layer_start: float = 0.01, + high_noise_skip_layer_end: float = 0.2, + high_noise_slg_scale: float = 0.0, + # --- + moe_boundary: float = 0.875, + strength: float = 0.75, + seed: int = 42, + video_frames: int = 1, + vace_strength: int = 1, + easycache: bool = False, + easycache_options: str = "0.2,0.15,0.95", upscale_factor: int = 1, + preview_method: Union[str, Preview, int, float] = "none", + preview_noisy: bool = False, + preview_interval: int = 1, + preview_callback: Optional[Callable] = None, progress_callback: Optional[Callable] = None, ) -> List[Image.Image]: - """Generate images from an image input and text prompt. + """Generate a video from input images and or a text prompt. Args: - image: The input image path or Pillow Image to direct the generation. prompt: The prompt to render. - mask_image: The inpainting mask image path or Pillow Image. negative_prompt: The negative prompt. - clip_skip: Ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer. + clip_skip: Ignore last layers of CLIP network (1 ignores none, 2 ignores one layer, <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x). + init_image: An input image path or Pillow Image to start the generation. + end_image: An input image path or Pillow Image to end the generation (required by flf2v). + control_frames: A list of control video frame image paths or Pillow Images in the correct order for the video. + width: Video width, in pixel space. + height: Video height, in pixel space. cfg_scale: Unconditional guidance scale. - guidance: Guidance scale. - eta: Eta in DDIM, only for DDIM and TCD. - width: Image height, in pixel space. - height: Image width, in pixel space. - sample_method: Sampling method. + image_cfg_scale: Image guidance scale for inpaint or instruct-pix2pix models (default: same as `cfg_scale`). + guidance: Distilled guidance scale for models with guidance input. + scheduler: Denoiser sigma scheduler (default: discrete). + sample_method: Sampling method (default: euler for Flux/SD3/Wan, euler_a otherwise). sample_steps: Number of sample steps. + eta: Eta in DDIM, only for DDIM and TCD. + timestep_shift: Shift timestep for NitroFusion models, default: 0, recommended N for NitroSD-Realism around 250 and 500 for NitroSD-Vibrant. + sigmas: Custom sigma values for the sampler, comma-separated (e.g. "14.61,7.8,3.5,0.0"). + skip_layers: Layers to skip for SLG steps (SLG will be enabled at step int([STEPS]x[START]) and disabled at int([STEPS]x[END])). + skip_layer_start: SLG enabling point. + skip_layer_end: SLG disabling point. + slg_scale: Skip layer guidance (SLG) scale, only for DiT models. + high_noise_cfg_scale: High noise diffusion model equivalent of `cfg_scale`. + high_noise_image_cfg_scale: High noise diffusion model equivalent of `image_cfg_scale`. + high_noise_guidance: High noise diffusion model equivalent of `guidance`. + high_noise_scheduler: High noise diffusion model equivalent of `scheduler`. + high_noise_sample_method: High noise diffusion model equivalent of `sample_method`. + high_noise_sample_steps: High noise diffusion model equivalent of `sample_steps` (default: -1 = auto). + high_noise_eta: High noise diffusion model equivalent of `eta`. + high_noise_skip_layers: High noise diffusion model equivalent of `skip_layers`. + high_noise_skip_layer_start: High noise diffusion model equivalent of `skip_layer_start`. + high_noise_skip_layer_end: High noise diffusion model equivalent of `skip_layer_end`. + high_noise_slg_scale: High noise diffusion model equivalent of `slg_scale`. + moe_boundary: Timestep boundary for Wan2.2 MoE model. Only enabled if `high_noise_sample_steps` is set to -1. strength: Strength for noising/unnoising. - seed: RNG seed (default: 42, use random seed for < 0). - batch_count: Number of images to generate. - control_cond: A control condition image path or Pillow Image. - control_strength: Strength to apply Control Net. - style_strength: Strength for keeping input identity (default: 20%). - normalize_input: Normalize PHOTOMAKER input id images. - input_id_images_path: Path to PHOTOMAKER input id images dir. - skip_layers: Layers to skip for SLG steps (default: [7,8,9]). - slg_scale: Skip layer guidance (SLG) scale, only for DiT models (default: 0). - skip_layer_start: SLG enabling point (default: 0.01). - skip_layer_end: SLG disabling point (default: 0.2). - canny: Apply canny edge detection preprocessor to the control_cond image. - upscale_factor: The image upscaling factor. + seed: RNG seed (uses random seed for < 0). + video_frames: Number of video frames to generate. + vace_strength: Wan VACE strength. + easycache: Enable EasyCache for DiT models with optional "threshold,start_percent,end_percent". + upscale_factor: Run the ESRGAN upscaler this many times. + preview_method: The preview method to use (default: none). + preview_noisy: Enables previewing noisy inputs of the models rather than the denoised outputs. + preview_interval: Interval in denoising steps between consecutive updates of the image preview (default: 1, meaning update at every step) + preview_callback: Callback function to call on each preview frame. progress_callback: Callback function to call on each step end. Returns: - A list of Pillow Images.""" + A list of Pillow Images (video frames).""" if self.model is None: - raise Exception("Stable diffusion model not loaded.") + raise RuntimeError("Stable Diffusion model not loaded") if self.vae_decode_only == True: - raise Exception("Cannot run img_to_img with vae_decode_only set to True.") + raise ValueError("`vae_decode_only` cannot be True when generating videos") + + # ------------------------------------------- + # Validation + # ------------------------------------------- + + width = self._validate_dimensions(width, "width") + height = self._validate_dimensions(height, "height") + + if upscale_factor < 1: + raise ValueError("`upscale_factor` must at least 1") + if sample_steps < 1: + raise ValueError("`sample_steps` must be at least 1") + if strength < 0.0 or strength > 1.0: + raise ValueError("`strength` must be in the range [0.0, 1.0]") + if video_frames < 1: + raise ValueError("`video_frames` must be at least 1") + if timestep_shift < 0 or timestep_shift > 1000: + raise ValueError("`timestep_shift` must be in the range [0, 1000]") - # =========== Validate string and int inputs =========== + if high_noise_sample_steps <= 0: + high_noise_sample_steps = -1 # Auto - sample_method = validate_and_set_input(sample_method, SAMPLE_METHOD_MAP, "sample_method") + # ------------------------------------------- + # CFG Scale + # ------------------------------------------- - # Ensure dimensions are multiples of 64 - width = validate_dimensions(width, "width") - height = validate_dimensions(height, "height") + image_cfg_scale = cfg_scale if image_cfg_scale is None else image_cfg_scale + high_noise_image_cfg_scale = high_noise_cfg_scale if high_noise_image_cfg_scale is None else high_noise_image_cfg_scale - # =========== Set seed =========== + # ------------------------------------------- + # Set Seed + # ------------------------------------------- # Set a random seed if seed is negative if seed < 0: seed = random.randint(0, 10000) - # ==================== Set the callback function ==================== + # ------------------------------------------- + # Set the Progress Callback Function + # ------------------------------------------- if progress_callback is not None: @@ -413,184 +822,224 @@ def sd_progress_callback( sd_cpp.sd_set_progress_callback(sd_progress_callback, ctypes.c_void_p(0)) - # ==================== Format Inputs ==================== + # ------------------------------------------- + # Set the Preview Callback Function + # ------------------------------------------- - # Convert the control condition to a C sd_image_t - control_cond = self._format_control_cond(control_cond, canny, self.control_net_path) - - # Resize the input image - image = self._resize_image(image, width, height) # Input image and generated image must have the same size - - def _create_blank_mask_image(width: int, height: int): - """Create a blank white mask image in c_unit8 format.""" - mask_image_buffer = (ctypes.c_uint8 * (width * height))(*[255] * (width * height)) - return mask_image_buffer - - # Convert the image and mask image to a byte array - image_pointer = self._image_to_sd_image_t_p(image) - if mask_image: - # Resize the mask image (however the mask should ideally already be the same size as the input image) - mask_image = self._resize_image(mask_image, width, height) - mask_image_pointer = self._image_to_sd_image_t_p(mask_image, channel=1) - else: - # Create a blank white mask image - mask_image_pointer = self._c_uint8_to_sd_image_t_p( - image=_create_blank_mask_image(width, height), - width=width, - height=height, - channel=1, - ) + preview_method = self._validate_and_set_input(preview_method, PREVIEW_MAP, "preview_method") - # Convert skip_layers to a ctypes array - skip_layers_array = (ctypes.c_int * len(skip_layers))(*skip_layers) - skip_layers_count = len(skip_layers) + if preview_callback is not None: - with suppress_stdout_stderr(disable=self.verbose): - # Generate images - c_images = sd_cpp.img2img( - self.model, - image_pointer, - mask_image_pointer, - prompt.encode("utf-8"), - negative_prompt.encode("utf-8"), - clip_skip, - cfg_scale, - guidance, - eta, - width, - height, - sample_method, - sample_steps, - strength, - seed, - batch_count, - control_cond, - control_strength, - style_strength, - normalize_input, - input_id_images_path.encode("utf-8"), - skip_layers_array, - skip_layers_count, - slg_scale, - skip_layer_start, - skip_layer_end, + @sd_cpp.sd_preview_callback + def sd_preview_callback( + step: int, + frame_count: int, + frames: sd_cpp.sd_image_t, + is_noisy: ctypes.c_bool, + data: ctypes.c_void_p, + ): + pil_frames = self._sd_image_t_p_to_images(frames, frame_count, 1) + preview_callback(step, pil_frames, is_noisy) + + sd_cpp.sd_set_preview_callback( + sd_preview_callback, + preview_method, + preview_interval, + not preview_noisy, + preview_noisy, + ctypes.c_void_p(0), ) - return self._sd_image_t_p_to_images(c_images, batch_count, upscale_factor) - # ============================================ - # Image to Video - # ============================================ + # ------------------------------------------- + # Extract Loras + # ------------------------------------------- - def img_to_vid( - self, - image: Union[Image.Image, str], - width: int = 512, - height: int = 512, - video_frames: int = 6, - motion_bucket_id: int = 127, - fps: int = 6, - augmentation_level: float = 0.0, - min_cfg: float = 1.0, - cfg_scale: float = 7.0, - sample_method: Optional[Union[str, SampleMethod, int, float]] = "euler_a", - sample_steps: int = 20, - strength: float = 0.75, - seed: int = 42, - progress_callback: Optional[Callable] = None, - ) -> List[Image.Image]: - """Generate a video from an image input. - - Args: - image: The input image path or Pillow Image to direct the generation. - width: Video height, in pixel space. - height: Video width, in pixel space. - video_frames: Number of frames in the video. - motion_bucket_id: Motion bucket id. - fps: Frames per second. - augmentation_level: The augmentation level. - min_cfg: The minimum cfg. - cfg_scale: Unconditional guidance scale. - sample_method: Sampling method. - sample_steps: Number of sample steps. - strength: Strength for noising/unnoising. - seed: RNG seed (default: 42, use random seed for < 0). - progress_callback: Callback function to call on each step end. + _prompt_without_loras, _lora_array, _lora_count, _lora_string_buffers = self._extract_and_build_loras( + prompt, + self.lora_model_dir, + ) - Returns: - A list of Pillow Images.""" + # ------------------------------------------- + # Control Frames + # ------------------------------------------- - # WARNING - Image to Video functionality does not work and must first be implemented in the C++ code. - raise NotImplementedError("SVD support is broken, do not use it.") + _control_frames_pointer, control_frames_size = self._create_image_array( + control_frames, + width=width, + height=height, + max_images=video_frames, + ) - # if self.model is None: - # raise Exception("Stable diffusion model not loaded.") + # ------------------------------------------- + # Scheduler/Sample Method + # ------------------------------------------- - # if self.vae_decode_only == True: - # raise Exception("Cannot run img_to_vid with vae_decode_only set to True.") + scheduler = self._validate_and_set_input(scheduler, SCHEDULER_MAP, "scheduler", allow_none=True) + if scheduler is None: + scheduler = sd_cpp.sd_get_default_scheduler(self.model) - # # =========== Validate string and int inputs =========== + # "sample_method_count" is not valid here (it will crash) + sample_method = self._validate_and_set_input( + sample_method, + {k: v for k, v in SAMPLE_METHOD_MAP.items() if k not in ["sample_method_count"]}, + "sample_method", + allow_none=True, + ) + if sample_method is None: + sample_method = sd_cpp.sd_get_default_sample_method(self.model) - # sample_method = validate_and_set_input(sample_method, SAMPLE_METHOD_MAP, "sample_method") + # High Noise + high_noise_scheduler = self._validate_and_set_input( + high_noise_scheduler, SCHEDULER_MAP, "high_noise_scheduler", allow_none=True + ) + if high_noise_scheduler is None: + high_noise_scheduler = scheduler - # # Ensure dimensions are multiples of 64 - # width = validate_dimensions(width, "width") - # height = validate_dimensions(height, "height") + high_noise_sample_method = self._validate_and_set_input( + high_noise_sample_method, SAMPLE_METHOD_MAP, "high_noise_sample_method", allow_none=True + ) + if high_noise_sample_method is None: + high_noise_sample_method = sample_method + + # ------------------------------------------- + # Sigmas + # ------------------------------------------- + + _custom_sigmas = self._parse_sigmas(sigmas) + _custom_sigmas_count = len(_custom_sigmas) + + SigmasArrayType = ctypes.c_float * _custom_sigmas_count + _custom_sigmas = ctypes.cast(SigmasArrayType(*_custom_sigmas), ctypes.POINTER(ctypes.c_float)) + + # ------------------------------------------- + # High Noise Parameters + # ------------------------------------------- + + _high_noise_guidance_params = sd_cpp.sd_guidance_params_t( + txt_cfg=high_noise_cfg_scale, + img_cfg=high_noise_image_cfg_scale, + distilled_guidance=high_noise_guidance, + slg=sd_cpp.sd_slg_params_t( + layers=(ctypes.c_int * len(high_noise_skip_layers))(*high_noise_skip_layers), # Convert to ctypes array + layer_count=len(high_noise_skip_layers), + layer_start=high_noise_skip_layer_start, + layer_end=high_noise_skip_layer_end, + scale=high_noise_slg_scale, + ), + ) - # # =========== Set seed =========== + _high_noise_sample_params = sd_cpp.sd_sample_params_t( + guidance=_high_noise_guidance_params, + scheduler=high_noise_scheduler, + sample_method=high_noise_sample_method, + sample_steps=high_noise_sample_steps, + eta=high_noise_eta, + shifted_timestep=timestep_shift, + custom_sigmas=_custom_sigmas, + custom_sigmas_count=_custom_sigmas_count, + ) - # # Set a random seed if seed is negative - # if seed < 0: - # seed = random.randint(0, 10000) + # ------------------------------------------- + # Parameters + # ------------------------------------------- - # # ==================== Set the callback function ==================== + _easycache_params = sd_cpp.sd_easycache_params_t( + **self._parse_easycache( + enabled=easycache, + option_value=easycache_options, + ) + ) - # if progress_callback is not None: + _guidance_params = sd_cpp.sd_guidance_params_t( + txt_cfg=cfg_scale, + img_cfg=image_cfg_scale, + distilled_guidance=guidance, + slg=sd_cpp.sd_slg_params_t( + layers=(ctypes.c_int * len(skip_layers))(*skip_layers), # Convert to ctypes array + layer_count=len(skip_layers), + layer_start=skip_layer_start, + layer_end=skip_layer_end, + scale=slg_scale, + ), + ) - # @sd_cpp.sd_progress_callback - # def sd_progress_callback( - # step: int, - # steps: int, - # time: float, - # data: ctypes.c_void_p, - # ): - # progress_callback(step, steps, time) + _sample_params = sd_cpp.sd_sample_params_t( + guidance=_guidance_params, + scheduler=scheduler, + sample_method=sample_method, + sample_steps=sample_steps, + eta=eta, + shifted_timestep=timestep_shift, + custom_sigmas=_custom_sigmas, + custom_sigmas_count=_custom_sigmas_count, + ) - # sd_cpp.sd_set_progress_callback(sd_progress_callback, ctypes.c_void_p(0)) + _params = sd_cpp.sd_vid_gen_params_t( + loras=_lora_array, + lora_count=_lora_count, + prompt=_prompt_without_loras.encode("utf-8"), + negative_prompt=negative_prompt.encode("utf-8"), + clip_skip=clip_skip, + init_image=self._format_init_image(init_image, width, height), + end_image=self._format_init_image(end_image, width, height), + control_frames=_control_frames_pointer, + control_frames_size=control_frames_size, + width=width, + height=height, + sample_params=_sample_params, + high_noise_sample_params=_high_noise_sample_params, + moe_boundary=moe_boundary, + strength=strength, + seed=seed, + video_frames=video_frames, + vace_strength=vace_strength, + easycache=_easycache_params, + ) - # # ==================== Format Inputs ==================== + # Log system info + log_event(level=2, message=sd_cpp.sd_get_system_info().decode("utf-8")) - # # Resize the input image - # image = self._resize_image( - # image, width, height - # ) # Input image and generated image must have the same size + _num_results = ctypes.c_int() + with suppress_stdout_stderr(disable=self.verbose): + # Generate the video + _c_images = sd_cpp.generate_video( + self.model, + ctypes.byref(_params), + ctypes.byref(_num_results), + ) - # # Convert the image to a byte array - # image_pointer = self._image_to_sd_image_t_p(image) + # Convert C array to Python list of images + images = self._sd_image_t_p_to_images(_c_images, int(_num_results.value), upscale_factor) + + # ------------------------------------------- + # Attach Image Metadata + # ------------------------------------------- + + func_args = locals() + gen_args = { + k: v + for k, v in func_args.items() + if k + not in { + "self", + "images", + "progress_callback", + "sd_progress_callback", + "preview_callback", + "sd_preview_callback", + } + and not k.startswith("_") # Skip internals + } + model_args = {k: v for k, v in self.__dict__.items() if not k.startswith("_")} # Skip internals - # with suppress_stdout_stderr(disable=self.verbose): - # # Generate the video - # c_video = sd_cpp.img2vid( - # self.model, - # image_pointer, - # width, - # height, - # video_frames, - # motion_bucket_id, - # fps, - # augmentation_level, - # min_cfg, - # cfg_scale, - # sample_method, - # sample_steps, - # strength, - # seed, - # ) + for image in images: + image.info.update({**model_args, **gen_args}) - # return self._sd_image_t_p_to_images(c_video, video_frames, 1) + return images - # ============================================ + # =========================================== # Preprocess Canny - # ============================================ + # =========================================== def preprocess_canny( self, @@ -600,31 +1049,30 @@ def preprocess_canny( weak: float = 0.8, strong: float = 1.0, inverse: bool = False, - output_as_c_uint8: bool = False, - ) -> Image.Image: - """Apply canny edge detection to an input image. Width and height determined automatically. + output_as_sd_image_t: bool = False, + ) -> Union[Image.Image, sd_cpp.sd_image_t]: + """Apply canny edge detection to an input image. + Width and height determined automatically. Args: - image: The input image path or Pillow Image. + image: An input image path or Pillow Image. high_threshold: High edge detection threshold. low_threshold: Low edge detection threshold. weak: Weak edge thickness. strong: Strong edge thickness. inverse: Invert the edge detection. - output_as_c_uint8: Return the output as a c_types uint8 pointer. + output_as_sd_image_t: Return the output as a c_types sd_image_t pointer. Returns: A Pillow Image.""" - # Convert the image to a C uint8 pointer - data, width, height = self._cast_image(image) + # Convert the image to a byte array + image_bytes = self._image_to_sd_image_t_p(image) with suppress_stdout_stderr(disable=self.verbose): - # Run the preprocess canny - c_image = sd_cpp.preprocess_canny( - data, - int(width), - int(height), + # Apply the preprocess canny + sd_cpp.preprocess_canny( + image_bytes, high_threshold, low_threshold, weak, @@ -632,21 +1080,18 @@ def preprocess_canny( inverse, ) - # Return the c_image if output_as_c_uint8 (for running inside txt2img/img2img pipeline) - if output_as_c_uint8: - return c_image + # Return the sd_image_t if output_as_sd_image_t (for running inside txt2img/img2img pipeline) + if output_as_sd_image_t: + return image_bytes - # Calculate the size of the data buffer (channels * width * height) - buffer_size = 3 * width * height - - # Convert c_image to a Pillow Image - image = self._c_array_to_bytes(c_image, buffer_size) - image = self._bytes_to_image(image, width, height) + # Load the image from the C sd_image_t and convert it to a PIL Image + image = self._dereference_sd_image_t_p(image_bytes) + image = self._bytes_to_image(image["data"], image["width"], image["height"]) return image - # ============================================ - # Image Upscaling - # ============================================ + # =========================================== + # Upscale + # =========================================== def upscale( self, @@ -659,14 +1104,17 @@ def upscale( Args: images: A list of image paths or Pillow Images to upscale. upscale_factor: The image upscaling factor. + progress_callback: Callback function to call on each step end. Returns: A list of Pillow Images.""" if self.upscaler is None: - raise Exception("Upscaling model not loaded.") + raise RuntimeError("Upscaling model not loaded") - # ==================== Set the callback function ==================== + # ------------------------------------------- + # Set the Callback Function + # ------------------------------------------- if progress_callback is not None: @@ -681,10 +1129,21 @@ def sd_progress_callback( sd_cpp.sd_set_progress_callback(sd_progress_callback, ctypes.c_void_p(0)) + # NOTE: Preview callback not supported for upscaling (nothing is called back from sd.cpp) + + # ------------------------------------------- + # Ensure List of Images + # ------------------------------------------- + if not isinstance(images, list): images = [images] # Wrap single image in a list - # ==================== Upscale images ==================== + # ------------------------------------------- + # Upscale Images + # ------------------------------------------- + + # Log system info + log_event(level=2, message=sd_cpp.sd_get_system_info().decode("utf-8")) upscaled_images = [] for image in images: @@ -707,19 +1166,413 @@ def sd_progress_callback( return upscaled_images - # ============================================ - # Utility functions - # ============================================ + # =========================================== + # Convert + # =========================================== - def _resize_image(self, image: Union[Image.Image, str], width: int, height: int) -> Image.Image: - """Resize an image to a new width and height.""" - image, _, _ = self._format_image(image) + def convert( + self, + input_path: str, + vae_path: str = "", + output_path: str = "output.gguf", + output_type: Union[str, GGMLType, int, float] = "default", + tensor_type_rules: str = "", + ) -> bool: + """Convert a model to gguf format. + + Args: + input_path: Path to the input model. + vae_path: Path to the vae. + output_path: Path to save the converted model. + output_type: The weight type (default: auto). + tensor_type_rules: Weight type per tensor pattern (example: "^vae\\\\.=f16,model\\\\.=q8_0") + + Returns: + A boolean indicating success.""" + + # ------------------------------------------- + # Validation + # ------------------------------------------- - # Resize the image if the width and height are different - if image.width != width or image.height != height: - image = image.resize((width, height), Image.Resampling.BILINEAR) + output_type = self._validate_and_set_input(output_type, GGML_TYPE_MAP, "output_type") + + # ------------------------------------------- + # Convert the Model + # ------------------------------------------- + + # Log system info + log_event(level=2, message=sd_cpp.sd_get_system_info().decode("utf-8")) + + with suppress_stdout_stderr(disable=self.verbose): + model_converted = sd_cpp.convert( + self._clean_path(input_path).encode("utf-8"), + self._clean_path(vae_path).encode("utf-8"), + self._clean_path(output_path).encode("utf-8"), + output_type, + tensor_type_rules.encode("utf-8"), + ) + + return model_converted + + # =========================================== + # Input Formatting and Validation + # =========================================== + + # ------------------------------------------- + # Extract and Remove Lora + # ------------------------------------------- + + def _extract_and_build_loras(self, prompt: str, lora_model_dir: str): + re_lora = re.compile(r"]+):([^>]+)>") + valid_ext = [".pt", ".safetensors", ".gguf"] + + lora_map = {} + high_noise_lora_map = {} + + tmp = prompt + + while True: + m = re_lora.search(tmp) + if not m: + break + + raw_path = m.group(1) + raw_mul = m.group(2) + + try: + mul = float(raw_mul) + except ValueError: + prompt = re_lora.sub("", prompt, count=1) + tmp = tmp[m.end() :] + continue + + is_high_noise = False + prefix = "|high_noise|" + + if raw_path.startswith(prefix): + raw_path = raw_path[len(prefix) :] + is_high_noise = True + + path = Path(raw_path) + final_path = path if path.is_absolute() else Path(lora_model_dir) / path + + if not final_path.exists(): + found = False + for ext in valid_ext: + try_path = final_path.with_suffix(final_path.suffix + ext) + if try_path.exists(): + final_path = try_path + found = True + break + if not found: + log_event(level=1, message=f"Can not find lora {final_path}") + prompt = re_lora.sub("", prompt, count=1) + tmp = tmp[m.end() :] + continue + + key = str(final_path.resolve()) + target = high_noise_lora_map if is_high_noise else lora_map + target[key] = target.get(key, 0.0) + mul + + prompt = re_lora.sub("", prompt, count=1) + tmp = tmp[m.end() :] + + # Build ctypes array + all_items = [] + for path, mul in lora_map.items(): + all_items.append((False, mul, path)) + + for path, mul in high_noise_lora_map.items(): + all_items.append((True, mul, path)) + + count = len(all_items) + LoraArray = sd_cpp.sd_lora_t * count + lora_array = LoraArray() + + # IMPORTANT: keep string buffers alive + string_buffers = [] + + for i, (is_high_noise, mul, path) in enumerate(all_items): + buf = ctypes.create_string_buffer(path.encode("utf-8")) + string_buffers.append(buf) + + lora_array[i].is_high_noise = is_high_noise + lora_array[i].multiplier = mul + lora_array[i].path = ctypes.cast(buf, ctypes.c_char_p) + + return prompt, lora_array, count, string_buffers + + # ------------------------------------------- + # Parse Sigmas + # ------------------------------------------- + + def _parse_sigmas(self, sigmas: str) -> list[float]: + if not sigmas: + return [] + + # Strip surrounding brackets + sigmas = sigmas.strip() + if sigmas.startswith("["): + sigmas = sigmas[1:] + if sigmas.endswith("]"): + sigmas = sigmas[:-1] + + custom_sigmas: list[float] = [] + + for item in sigmas.split(","): + item = item.strip() + if not item: + continue + + try: + custom_sigmas.append(float(item)) + except ValueError as e: + raise ValueError(f"Invalid float value '{item}' in sigmas") from e + + if not custom_sigmas and sigmas: + raise ValueError(f"Could not parse any sigma values from '{sigmas}'") + + return custom_sigmas + + # ------------------------------------------- + # Parse EasyCache + # ------------------------------------------- + + def _parse_easycache(self, enabled: bool, option_value: str) -> dict: + parts = [p.strip() for p in str(option_value).split(",")] + if len(parts) != 3: + raise ValueError("easycache expects exactly 3 comma-separated values (threshold,start,end)") + + try: + threshold, start, end = map(float, parts) + except ValueError: + raise ValueError(f"invalid easycache value '{option_value}'") + + if threshold < 0.0: + raise ValueError("easycache threshold must be non-negative") + + if not (0.0 <= start < end <= 1.0): + raise ValueError("easycache start/end must satisfy 0.0 <= start < end <= 1.0") + + return { + "enabled": enabled, + "reuse_threshold": threshold, + "start_percent": start, + "end_percent": end, + } + + # ------------------------------------------- + # Parse Tile Size + # ------------------------------------------- + + def _parse_tile_size(self, value: Optional[Union[str, float, int]], as_float: bool = False) -> tuple: + if not value: + return (0.0, 0.0) if as_float else (0, 0) + + try: + if "x" in value: + x_str, y_str = value.split("x", 1) + x = float(x_str) if as_float else int(x_str) + y = float(y_str) if as_float else int(y_str) + else: + v = float(value) if as_float else int(value) + x = y = v + except (ValueError, OverflowError): + raise ValueError(f"Invalid tile size value: {value}") + + return (x, y) + + # ------------------------------------------- + # Format Control Image + # ------------------------------------------- + + def _format_control_image( + self, control_image: Optional[Union[Image.Image, str]], canny: bool, width: int, height: int + ) -> sd_cpp.sd_image_t: + """Convert an image path or Pillow Image to an C sd_image_t image.""" + + if not isinstance(control_image, (str, Image.Image)): + # Return an empty sd_image_t + return self._c_uint8_to_sd_image_t_p( + image=None, + width=width, + height=height, + channel=3, + ) + + if canny: + # Apply canny edge detection preprocessor to Pillow Image + image, width, height = self._format_image(control_image) + image = self.preprocess_canny(image, output_as_sd_image_t=True) + else: + # Convert Pillow Image to C sd_image_t + image = self._image_to_sd_image_t_p(control_image) return image + # ------------------------------------------- + # Format Init Image + # ------------------------------------------- + + def _format_init_image(self, init_image: Optional[Union[Image.Image, str]], width: int, height: int) -> sd_cpp.sd_image_t: + if isinstance(init_image, (str, Image.Image)): + # Input image and generated image must have the same size + init_image = self._resize_image(init_image, width, height) + return self._image_to_sd_image_t_p(init_image) # Convert to byte array + else: + # Return an empty sd_image_t + return self._c_uint8_to_sd_image_t_p( + image=None, + width=width, + height=height, + channel=3, + ) + + # ------------------------------------------- + # Format Mask Image + # ------------------------------------------- + + def _format_mask_image(self, mask_image: Optional[Union[Image.Image, str]], width: int, height: int) -> sd_cpp.sd_image_t: + if isinstance(mask_image, (str, Image.Image)): + # Resize the mask image (ideally it should already match the input image size) + mask_image = self._resize_image(mask_image, width, height) + return self._image_to_sd_image_t_p(mask_image, channel=1) # Convert to byte array + else: + # Return a blank white mask image in c_unit8 format + return self._c_uint8_to_sd_image_t_p( + image=(ctypes.c_uint8 * (width * height))(*[255] * (width * height)), + width=width, + height=height, + channel=1, + ) + + # ------------------------------------------- + # Create Image Array + # ------------------------------------------- + + def _create_image_array( + self, + images: List[Union[Image.Image, str]], + width: Optional[int] = None, + height: Optional[int] = None, + max_images: Optional[int] = None, + resize: bool = True, + ) -> List[sd_cpp.sd_image_t]: + if not isinstance(images, list): + images = [images] + + # Enforce max_images + if max_images is not None and max_images > 0: + images = images[:max_images] + + reference_images = [] + for img in images: + if not isinstance(img, (str, Image.Image)): + # Skip invalid images + continue + + if width and height and resize == True: + # Resize if width and height are provided + img = self._resize_image(img, width=width, height=height) + + # Convert the image to a byte array + img_ptr = self._image_to_sd_image_t_p(img) + reference_images.append(img_ptr) + + # Create a contiguous array of sd_image_t + ImageArrayType = sd_cpp.sd_image_t * len(reference_images) + return ImageArrayType(*reference_images), len(reference_images) + + # ------------------------------------------- + # Validate Dimensions + # ------------------------------------------- + + def _validate_dimensions(self, dimension: Union[int, float], attribute_name: str) -> int: + dimension = int(dimension) + if dimension <= 0: + raise ValueError(f"`{attribute_name}` must be greater than 0") + return dimension + + # ------------------------------------------- + # Validate and Set Input + # ------------------------------------------- + + def _validate_and_set_input( + self, user_input: Union[str, int, float, None], type_map: Dict, attribute_name: str, allow_none: bool = False + ) -> Optional[int]: + """Validate an input strinbg or int from a map of strings to integers.""" + if user_input is None and allow_none == True: + return None + + if isinstance(user_input, float): + user_input = int(user_input) # Convert float to int + + # Handle string input + if isinstance(user_input, str): + user_input = user_input.strip().lower() + if user_input in type_map: + map_result = type_map[user_input] + if map_result is None: + return None + + return int(type_map[user_input]) + else: + raise ValueError(f"Invalid `{attribute_name}` type '{user_input}'. Must be one of {list(type_map.keys())}.") + elif isinstance(user_input, int) and user_input in type_map.values(): + return int(user_input) + else: + raise ValueError(f"`{attribute_name}` must be a string or an integer matching one of {list(type_map.keys())}") + + # =========================================== + # Utility Functions + # =========================================== + + # ------------------------------------------- + # Resize Image + # ------------------------------------------- + + def _resize_image( + self, + image: Union[Image.Image, str], + width: int, + height: int, + ) -> Image.Image: + image, _, _ = self._format_image(image) + + if image.width == width and image.height == height: + return image + + if self.image_resize_method == "resize": + return image.resize((width, height), Image.Resampling.BILINEAR) + + elif self.image_resize_method == "crop": + src_w, src_h = image.width, image.height + src_aspect = src_w / src_h + dst_aspect = width / height + + # Default crop box is full image + crop_x, crop_y = 0, 0 + crop_w, crop_h = src_w, src_h + + if src_aspect > dst_aspect: + # Source is wider than destination -> crop width + crop_w = int(src_h * dst_aspect) + crop_x = (src_w - crop_w) // 2 + elif src_aspect < dst_aspect: + # Source is taller than destination -> crop height + crop_h = int(src_w / dst_aspect) + crop_y = (src_h - crop_h) // 2 + + # Crop first, then resize + image = image.crop((crop_x, crop_y, crop_x + crop_w, crop_y + crop_h)) + return image.resize((width, height), Image.Resampling.BILINEAR) + + else: + raise ValueError(f"Invalid `image_resize_method` '{self.image_resize_method}', must be 'resize' or 'crop'") + + # ------------------------------------------- + # Format Image + # ------------------------------------------- + def _format_image( self, image: Union[Image.Image, str], @@ -728,7 +1581,7 @@ def _format_image( """Convert an image path or Pillow Image to a Pillow Image of RGBA or grayscale (inpainting masks) format.""" # Convert image path to image if str if isinstance(image, str): - image = Image.open(image) + image = Image.open(self._clean_path(image)) if channel == 1: # Grayscale the image if channel is 1 @@ -744,32 +1597,9 @@ def _format_image( return image, image.width, image.height - def _format_control_cond( - self, - control_cond: Optional[Union[Image.Image, str]], - canny: bool, - control_net_path: str, - ) -> Optional[Image.Image]: - """Convert an image path or Pillow Image to an C sd_image_t image.""" - - if not control_cond: - return None - - if not control_net_path: - log_event(1, "'control_net_path' not set. Skipping control condition.") - return None - - if canny: - # Convert Pillow Image to canny edge detection image then format into C sd_image_t - image, width, height = self._format_image(control_cond) - image = self.preprocess_canny(image, output_as_c_uint8=True) - image = self._c_uint8_to_sd_image_t_p(image, width, height) - else: - # Convert Pillow Image to C sd_image_t - image = self._image_to_sd_image_t_p(control_cond) - return image - - # ============= Image to C uint8 pointer ============= + # ------------------------------------------- + # Image to C uint8 pointer + # ------------------------------------------- def _cast_image(self, image: Union[Image.Image, str], channel: int = 3): """Cast a PIL Image to a C uint8 pointer.""" @@ -783,9 +1613,13 @@ def _cast_image(self, image: Union[Image.Image, str], channel: int = 3): ) return data, width, height - # ============= Image to C sd_image_t ============= + # ------------------------------------------- + # C uint8 pointer to C sd_image_t + # ------------------------------------------- - def _c_uint8_to_sd_image_t_p(self, image: ctypes.c_uint8, width: int, height: int, channel: int = 3) -> sd_cpp.sd_image_t: + def _c_uint8_to_sd_image_t_p( + self, image: Union[ctypes.c_uint8, None], width: int, height: int, channel: int = 3 + ) -> sd_cpp.sd_image_t: """Convert a C uint8 pointer to a C sd_image_t.""" c_image = sd_cpp.sd_image_t( width=width, @@ -795,17 +1629,27 @@ def _c_uint8_to_sd_image_t_p(self, image: ctypes.c_uint8, width: int, height: in ) return c_image + # ------------------------------------------- + # Image to C sd_image_t + # ------------------------------------------- + def _image_to_sd_image_t_p(self, image: Union[Image.Image, str], channel: int = 3) -> sd_cpp.sd_image_t: """Convert a PIL Image or image path to a C sd_image_t.""" data, width, height = self._cast_image(image, channel) c_image = self._c_uint8_to_sd_image_t_p(data, width, height, channel) return c_image - # ============= C sd_image_t to Image ============= + # ------------------------------------------- + # C sd_image_t to Image + # ------------------------------------------- def _c_array_to_bytes(self, c_array, buffer_size: int) -> bytes: return bytearray(ctypes.cast(c_array, ctypes.POINTER(ctypes.c_byte * buffer_size)).contents) + # ------------------------------------------- + # Dereference C sd_image_t pointer + # ------------------------------------------- + def _dereference_sd_image_t_p(self, c_image: sd_cpp.sd_image_t) -> Dict: """Dereference a C sd_image_t pointer to a Python dictionary with height, width, channel and data (bytes).""" @@ -820,6 +1664,10 @@ def _dereference_sd_image_t_p(self, c_image: sd_cpp.sd_image_t) -> Dict: } return image + # ------------------------------------------- + # Image Slice + # ------------------------------------------- + def _image_slice(self, c_images: sd_cpp.sd_image_t, count: int, upscale_factor: int) -> List[Dict]: """Slice a C array of images.""" image_array = ctypes.cast(c_images, ctypes.POINTER(sd_cpp.sd_image_t * count)).contents @@ -831,14 +1679,11 @@ def _image_slice(self, c_images: sd_cpp.sd_image_t, count: int, upscale_factor: # Upscale the image if upscale_factor > 1: - if self.upscaler is None: - raise Exception("Upscaling model not loaded.") - else: - c_image = sd_cpp.upscale( - self.upscaler, - c_image, - upscale_factor, - ) + c_image = sd_cpp.upscale( + self.upscaler, + c_image, + upscale_factor, + ) image = self._dereference_sd_image_t_p(c_image) images.append(image) @@ -846,6 +1691,10 @@ def _image_slice(self, c_images: sd_cpp.sd_image_t, count: int, upscale_factor: # Return the list of images return images + # ------------------------------------------- + # sd_image_t_p to Images + # ------------------------------------------- + def _sd_image_t_p_to_images(self, c_images: sd_cpp.sd_image_t, count: int, upscale_factor: int) -> List[Image.Image]: """Convert C sd_image_t_p images to a Python list of images.""" @@ -859,7 +1708,9 @@ def _sd_image_t_p_to_images(self, c_images: sd_cpp.sd_image_t, count: int, upsca return images - # ============= Bytes to Image ============= + # ------------------------------------------- + # Bytes to Image + # ------------------------------------------- def _bytes_to_image(self, byte_data: bytes, width: int, height: int, channel: int = 3) -> Image.Image: """Convert a byte array to a PIL Image.""" @@ -878,12 +1729,23 @@ def _bytes_to_image(self, byte_data: bytes, width: int, height: int, channel: in elif channel == 4: # RGBA pass # Use color as is else: - raise ValueError(f"Unsupported channel value: {channel}") + raise ValueError(f"Unsupported channel value: '{channel}'") # Set the pixel image.putpixel((x, y), color) return image + # ------------------------------------------- + # Clean Path + # ------------------------------------------- + + def _clean_path(self, path: str) -> str: + return os.path.normpath(path.strip()) if path else "" + + # ------------------------------------------- + # State Management + # ------------------------------------------- + def __setstate__(self, state) -> None: self.__init__(**state) @@ -895,71 +1757,52 @@ def __del__(self) -> None: self.close() -# ============================================ -# Validate dimension parameters -# ============================================ - - -def validate_dimensions(dimension: Union[int, float], attribute_name: str) -> int: - """Dimensions must be a multiple of 64 otherwise a GGML_ASSERT error is encountered.""" - dimension = int(dimension) - if dimension <= 0 or dimension % 64 != 0: - raise ValueError(f"The '{attribute_name}' must be a multiple of 64.") - return dimension - - -# ============================================ -# Mapping from strings to constants -# ============================================ - - -def validate_and_set_input(user_input: Union[str, int, float], type_map: Dict, attribute_name: str): - """Validate an input strinbg or int from a map of strings to integers.""" - if isinstance(user_input, float): - user_input = int(user_input) # Convert float to int - - # Handle string input - if isinstance(user_input, str): - user_input = user_input.strip().lower() - if user_input in type_map: - return int(type_map[user_input]) - else: - raise ValueError(f"Invalid {attribute_name} type '{user_input}'. Must be one of {list(type_map.keys())}.") - elif isinstance(user_input, int) and user_input in type_map.values(): - return int(user_input) - else: - raise ValueError(f"{attribute_name} must be a string or an integer and must be a valid type.") - - RNG_TYPE_MAP = { "default": RNGType.STD_DEFAULT_RNG, - "cuda": RNGType.CUDA_RNG, + "cuda": RNGType.CUDA_RNG, # Default + "cpu": RNGType.CPU_RNG, + "type_count": RNGType.RNG_TYPE_COUNT, } SAMPLE_METHOD_MAP = { - "euler_a": SampleMethod.EULER_A, - "euler": SampleMethod.EULER, - "heun": SampleMethod.HEUN, - "dpm2": SampleMethod.DPM2, - "dpmpp2s_a": SampleMethod.DPMPP2S_A, - "dpmpp2m": SampleMethod.DPMPP2M, - "dpmpp2mv2": SampleMethod.DPMPP2Mv2, - "ipndm": SampleMethod.IPNDM, - "ipndm_v": SampleMethod.IPNDM_V, - "lcm": SampleMethod.LCM, - "ddim_trailing": SampleMethod.DDIM_TRAILING, - "tcd": SampleMethod.TCD, - "n_sample_methods": SampleMethod.N_SAMPLE_METHODS, + "default": None, # Default + "euler": SampleMethod.EULER_SAMPLE_METHOD, + "euler_a": SampleMethod.EULER_A_SAMPLE_METHOD, + "heun": SampleMethod.HEUN_SAMPLE_METHOD, + "dpm2": SampleMethod.DPM2_SAMPLE_METHOD, + "dpm++2s_a": SampleMethod.DPMPP2S_A_SAMPLE_METHOD, + "dpm++2m": SampleMethod.DPMPP2M_SAMPLE_METHOD, + "dpm++2mv2": SampleMethod.DPMPP2Mv2_SAMPLE_METHOD, + "ipndm": SampleMethod.IPNDM_SAMPLE_METHOD, + "ipndm_v": SampleMethod.IPNDM_V_SAMPLE_METHOD, + "lcm": SampleMethod.LCM_SAMPLE_METHOD, + "ddim_trailing": SampleMethod.DDIM_TRAILING_SAMPLE_METHOD, + "tcd": SampleMethod.TCD_SAMPLE_METHOD, + "sample_method_count": SampleMethod.SAMPLE_METHOD_COUNT, +} + +SCHEDULER_MAP = { + "default": None, # Default + "discrete": Scheduler.DISCRETE_SCHEDULER, + "karras": Scheduler.KARRAS_SCHEDULER, + "exponential": Scheduler.EXPONENTIAL_SCHEDULER, + "ays": Scheduler.AYS_SCHEDULER, + "gits": Scheduler.GITS_SCHEDULER, + "sgm_uniform": Scheduler.SGM_UNIFORM_SCHEDULER, + "simple": Scheduler.SIMPLE_SCHEDULER, + "smoothstep": Scheduler.SMOOTHSTEP_SCHEDULER, + "lcm": Scheduler.LCM_SCHEDULER, + "scheduler_count": Scheduler.SCHEDULER_COUNT, } -SCHEDULE_MAP = { - "default": Schedule.DEFAULT, - "discrete": Schedule.DISCRETE, - "karras": Schedule.KARRAS, - "exponential": Schedule.EXPONENTIAL, - "ays": Schedule.AYS, - "gits": Schedule.GITS, - "n_schedules": Schedule.N_SCHEDULES, +PREDICTION_MAP = { + "eps": Prediction.EPS_PRED, + "v": Prediction.V_PRED, + "edm_v": Prediction.EDM_V_PRED, + "flow": Prediction.FLOW_PRED, + "flux_flow": Prediction.FLUX_FLOW_PRED, + "flux2_flow": Prediction.FLUX2_FLOW_PRED, + "default": Prediction.PREDICTION_COUNT, # Default } GGML_TYPE_MAP = { @@ -1001,6 +1844,21 @@ def validate_and_set_input(user_input: Union[str, int, float], type_map: Dict, a # "iq4_nl_4_4": GGMLType.SD_TYPE_IQ4_NL_4_4, # "iq4_nl_4_8": GGMLType.SD_TYPE_IQ4_NL_4_8, # "iq4_nl_8_8": GGMLType.SD_TYPE_IQ4_NL_8_8, - # Default - "default": GGMLType.SD_TYPE_COUNT, + "mxfp4": GGMLType.SD_TYPE_MXFP4, + "default": GGMLType.SD_TYPE_COUNT, # Default +} + +PREVIEW_MAP = { + "none": Preview.PREVIEW_NONE, # Default + "proj": Preview.PREVIEW_PROJ, + "tae": Preview.PREVIEW_TAE, + "vae": Preview.PREVIEW_VAE, + "preview_count": Preview.PREVIEW_COUNT, +} + +LORA_APPLY_MODE_MAP = { + "auto": LoraApplyMode.LORA_APPLY_AUTO, # Default + "immediately": LoraApplyMode.LORA_APPLY_IMMEDIATELY, + "at_runtime": LoraApplyMode.LORA_APPLY_AT_RUNTIME, + "lora_apply_mode_count": LoraApplyMode.LORA_APPLY_MODE_COUNT, } diff --git a/stable_diffusion_cpp/stable_diffusion_cpp.py b/stable_diffusion_cpp/stable_diffusion_cpp.py index eda2a86..cc83417 100644 --- a/stable_diffusion_cpp/stable_diffusion_cpp.py +++ b/stable_diffusion_cpp/stable_diffusion_cpp.py @@ -137,10 +137,6 @@ def byref(obj: CtypesCData, offset: Optional[int] = None) -> CtypesRef[CtypesCDa byref = ctypes.byref # type: ignore -# from ggml-backend.h -# typedef bool (*ggml_backend_sched_eval_callback)(struct ggml_tensor * t, bool ask, void * user_data); -ggml_backend_sched_eval_callback = ctypes.CFUNCTYPE(ctypes.c_bool, ctypes.c_void_p, ctypes.c_bool, ctypes.c_void_p) - # // Abort callback # // If not NULL, called before ggml computation # // If it returns true, the computation is aborted @@ -155,69 +151,98 @@ def byref(obj: CtypesCData, offset: Optional[int] = None) -> CtypesRef[CtypesCDa # enum rng_type_t { # STD_DEFAULT_RNG, -# CUDA_RNG +# CUDA_RNG, +# CPU_RNG, +# RNG_TYPE_COUNT # }; class RNGType(IntEnum): STD_DEFAULT_RNG = 0 CUDA_RNG = 1 + CPU_RNG = 2 + RNG_TYPE_COUNT = 3 # enum sample_method_t { -# EULER_A, -# EULER, -# HEUN, -# DPM2, -# DPMPP2S_A, -# DPMPP2M, -# DPMPP2Mv2, -# IPNDM, -# IPNDM_V, -# LCM, -# DDIM_TRAILING, -# TCD, -# N_SAMPLE_METHODS +# EULER_SAMPLE_METHOD, +# EULER_A_SAMPLE_METHOD, +# HEUN_SAMPLE_METHOD, +# DPM2_SAMPLE_METHOD, +# DPMPP2S_A_SAMPLE_METHOD, +# DPMPP2M_SAMPLE_METHOD, +# DPMPP2Mv2_SAMPLE_METHOD, +# IPNDM_SAMPLE_METHOD, +# IPNDM_V_SAMPLE_METHOD, +# LCM_SAMPLE_METHOD, +# DDIM_TRAILING_SAMPLE_METHOD, +# TCD_SAMPLE_METHOD, +# SAMPLE_METHOD_COUNT # }; class SampleMethod(IntEnum): - EULER_A = 0 - EULER = 1 - HEUN = 2 - DPM2 = 3 - DPMPP2S_A = 4 - DPMPP2M = 5 - DPMPP2Mv2 = 6 - IPNDM = 7 - IPNDM_V = 8 - LCM = 9 - DDIM_TRAILING = 10 - TCD = 11 - N_SAMPLE_METHODS = 12 - - -# enum schedule_t { -# DEFAULT, -# DISCRETE, -# KARRAS, -# EXPONENTIAL, -# AYS, -# GITS, -# N_SCHEDULES + EULER_SAMPLE_METHOD = 0 + EULER_A_SAMPLE_METHOD = 1 + HEUN_SAMPLE_METHOD = 2 + DPM2_SAMPLE_METHOD = 3 + DPMPP2S_A_SAMPLE_METHOD = 4 + DPMPP2M_SAMPLE_METHOD = 5 + DPMPP2Mv2_SAMPLE_METHOD = 6 + IPNDM_SAMPLE_METHOD = 7 + IPNDM_V_SAMPLE_METHOD = 8 + LCM_SAMPLE_METHOD = 9 + DDIM_TRAILING_SAMPLE_METHOD = 10 + TCD_SAMPLE_METHOD = 11 + SAMPLE_METHOD_COUNT = 12 + + +# enum scheduler_t { +# DISCRETE_SCHEDULER, +# KARRAS_SCHEDULER, +# EXPONENTIAL_SCHEDULER, +# AYS_SCHEDULER, +# GITS_SCHEDULER, +# SGM_UNIFORM_SCHEDULER, +# SIMPLE_SCHEDULER, +# SMOOTHSTEP_SCHEDULER, +# LCM_SCHEDULER, +# SCHEDULER_COUNT +# }; +class Scheduler(IntEnum): + DISCRETE_SCHEDULER = 0 + KARRAS_SCHEDULER = 1 + EXPONENTIAL_SCHEDULER = 2 + AYS_SCHEDULER = 3 + GITS_SCHEDULER = 4 + SGM_UNIFORM_SCHEDULER = 5 + SIMPLE_SCHEDULER = 6 + SMOOTHSTEP_SCHEDULER = 7 + LCM_SCHEDULER = 8 + SCHEDULER_COUNT = 9 + + +# enum prediction_t { +# EPS_PRED, +# V_PRED, +# EDM_V_PRED, +# FLOW_PRED, +# FLUX_FLOW_PRED, +# FLUX2_FLOW_PRED, +# PREDICTION_COUNT # }; -class Schedule(IntEnum): - DEFAULT = 0 - DISCRETE = 1 - KARRAS = 2 - EXPONENTIAL = 3 - AYS = 4 - GITS = 5 - N_SCHEDULES = 6 +class Prediction(IntEnum): + EPS_PRED = 0 + V_PRED = 1 + EDM_V_PRED = 2 + FLOW_PRED = 3 + FLUX_FLOW_PRED = 4 + FLUX2_FLOW_PRED = 5 + PREDICTION_COUNT = 6 # // same as enum ggml_type # enum sd_type_t { -# SD_TYPE_F32 = 0, -# SD_TYPE_F16 = 1, -# SD_TYPE_Q4_0 = 2, -# SD_TYPE_Q4_1 = 3, +# SD_TYPE_F32 = 0, +# SD_TYPE_F16 = 1, +# SD_TYPE_Q4_0 = 2, +# SD_TYPE_Q4_1 = 3, # // SD_TYPE_Q4_2 = 4, support has been removed # // SD_TYPE_Q4_3 = 5, support has been removed # SD_TYPE_Q5_0 = 6, @@ -248,12 +273,13 @@ class Schedule(IntEnum): # // SD_TYPE_Q4_0_4_4 = 31, support has been removed from gguf files # // SD_TYPE_Q4_0_4_8 = 32, # // SD_TYPE_Q4_0_8_8 = 33, -# SD_TYPE_TQ1_0 = 34, -# SD_TYPE_TQ2_0 = 35, +# SD_TYPE_TQ1_0 = 34, +# SD_TYPE_TQ2_0 = 35, # // SD_TYPE_IQ4_NL_4_4 = 36, # // SD_TYPE_IQ4_NL_4_8 = 37, # // SD_TYPE_IQ4_NL_8_8 = 38, -# SD_TYPE_COUNT = 39, +# SD_TYPE_MXFP4 = 39, // MXFP4 (1 block) +# SD_TYPE_COUNT = 40, # }; class GGMLType(IntEnum): SD_TYPE_F32 = 0 @@ -296,81 +322,150 @@ class GGMLType(IntEnum): # SD_TYPE_IQ4_NL_4_4 = 36, # SD_TYPE_IQ4_NL_4_8 = 37, # SD_TYPE_IQ4_NL_8_8 = 38, - SD_TYPE_COUNT = 39 + SD_TYPE_MXFP4 = 39 # MXFP4 (1 block) + SD_TYPE_COUNT = 40 + + +# enum preview_t { +# PREVIEW_NONE, +# PREVIEW_PROJ, +# PREVIEW_TAE, +# PREVIEW_VAE, +# PREVIEW_COUNT +# }; +class Preview(IntEnum): + PREVIEW_NONE = 0 + PREVIEW_PROJ = 1 + PREVIEW_TAE = 2 + PREVIEW_VAE = 3 + PREVIEW_COUNT = 4 + + +# enum lora_apply_mode_t { +# LORA_APPLY_AUTO, +# LORA_APPLY_IMMEDIATELY, +# LORA_APPLY_AT_RUNTIME, +# LORA_APPLY_MODE_COUNT, +# }; +class LoraApplyMode(IntEnum): + LORA_APPLY_AUTO = 0 + LORA_APPLY_IMMEDIATELY = 1 + LORA_APPLY_AT_RUNTIME = 2 + LORA_APPLY_MODE_COUNT = 3 -# ================================== +# =========================================== # Inference -# ================================== +# =========================================== + + +# ------------------------------------------- +# sd_embedding_t +# ------------------------------------------- + + +# typedef struct { const char* name; const char* path; } sd_embedding_t; +class sd_embedding_t(ctypes.Structure): + _fields_ = [ + ("name", ctypes.c_char_p), + ("path", ctypes.c_char_p), + ] + + +# ------------------------------------------- +# sd_ctx_params_t +# ------------------------------------------- + + +# typedef struct { const char* model_path; const char* clip_l_path; const char* clip_g_path; const char* clip_vision_path; const char* t5xxl_path; const char* llm_path; const char* llm_vision_path; const char* diffusion_model_path; const char* high_noise_diffusion_model_path; const char* vae_path; const char* taesd_path; const char* control_net_path; const char* lora_model_dir; const sd_embedding_t* embeddings; uint32_t embedding_count; const char* photo_maker_path; const char* tensor_type_rules; bool vae_decode_only; bool free_params_immediately; int n_threads; enum sd_type_t wtype; enum rng_type_t rng_type; enum rng_type_t sampler_rng_type; enum prediction_t prediction; enum lora_apply_mode_t lora_apply_mode; bool offload_params_to_cpu; bool keep_clip_on_cpu; bool keep_control_net_on_cpu; bool keep_vae_on_cpu; bool diffusion_flash_attn; bool tae_preview_only; bool diffusion_conv_direct; bool vae_conv_direct; bool force_sdxl_vae_conv_scale; bool chroma_use_dit_mask; bool chroma_use_t5_mask; int chroma_t5_mask_pad; float flow_shift; } sd_ctx_params_t; +class sd_ctx_params_t(ctypes.Structure): + _fields_ = [ + ("model_path", ctypes.c_char_p), + ("clip_l_path", ctypes.c_char_p), + ("clip_g_path", ctypes.c_char_p), + ("clip_vision_path", ctypes.c_char_p), + ("t5xxl_path", ctypes.c_char_p), + ("llm_path", ctypes.c_char_p), + ("llm_vision_path", ctypes.c_char_p), + ("diffusion_model_path", ctypes.c_char_p), + ("high_noise_diffusion_model_path", ctypes.c_char_p), + ("vae_path", ctypes.c_char_p), + ("taesd_path", ctypes.c_char_p), + ("control_net_path", ctypes.c_char_p), + ("lora_model_dir", ctypes.c_char_p), + ("embeddings", ctypes.POINTER(sd_embedding_t)), + ("embedding_count", ctypes.c_uint32), + ("photo_maker_path", ctypes.c_char_p), + ("tensor_type_rules", ctypes.c_char_p), + ("vae_decode_only", ctypes.c_bool), + ("free_params_immediately", ctypes.c_bool), + ("n_threads", ctypes.c_int), + ("wtype", ctypes.c_int), # GGMLType + ("rng_type", ctypes.c_int), # RNGType + ("sampler_rng_type", ctypes.c_int), # RNGType + ("prediction", ctypes.c_int), # Prediction + ("lora_apply_mode", ctypes.c_int), # LoraApplyMode + ("offload_params_to_cpu", ctypes.c_bool), + ("keep_clip_on_cpu", ctypes.c_bool), + ("keep_control_net_on_cpu", ctypes.c_bool), + ("keep_vae_on_cpu", ctypes.c_bool), + ("diffusion_flash_attn", ctypes.c_bool), + ("tae_preview_only", ctypes.c_bool), + ("diffusion_conv_direct", ctypes.c_bool), + ("vae_conv_direct", ctypes.c_bool), + ("force_sdxl_vae_conv_scale", ctypes.c_bool), + ("chroma_use_dit_mask", ctypes.c_bool), + ("chroma_use_t5_mask", ctypes.c_bool), + ("chroma_t5_mask_pad", ctypes.c_int), + ("flow_shift", ctypes.c_float), + ] + + +# ------------------------------------------- +# sd_ctx_t +# ------------------------------------------- -# ------------ new_sd_ctx ------------ -# struct sd_context; +# typedef struct sd_ctx_t sd_ctx_t; +class sd_ctx_t(ctypes.Structure): + pass + + +# struct sd_ctx; sd_ctx_t_p = NewType("sd_ctx_t_p", int) -sd_ctx_t_p_ctypes = ctypes.c_void_p +sd_ctx_t_p_ctypes = ctypes.POINTER(sd_ctx_t) + +# ------------------------------------------- +# new_sd_ctx +# ------------------------------------------- + +# SD_API sd_ctx_t* new_sd_ctx(const sd_ctx_params_t* sd_ctx_params); @ctypes_function( "new_sd_ctx", [ - ctypes.c_char_p, # model_path - ctypes.c_char_p, # clip_l_path - ctypes.c_char_p, # clip_g_path - ctypes.c_char_p, # t5xxl_path - ctypes.c_char_p, # diffusion_model_path - ctypes.c_char_p, # vae_path - ctypes.c_char_p, # taesd_path - ctypes.c_char_p, # control_net_path - ctypes.c_char_p, # lora_model_dir - ctypes.c_char_p, # embed_dir - ctypes.c_char_p, # stacked_id_embed_dir - ctypes.c_bool, # vae_decode_only - ctypes.c_bool, # vae_tiling - ctypes.c_bool, # free_params_immediately - ctypes.c_int, # n_threads - ctypes.c_int, # wtype (GGMLType) - ctypes.c_int, # rng_type (RNGType) - ctypes.c_int, # s (Schedule) - ctypes.c_bool, # keep_clip_on_cpu - ctypes.c_bool, # keep_control_net_cpu - ctypes.c_bool, # keep_vae_on_cpu - ctypes.c_bool, # diffusion_flash_attn + ctypes.POINTER(sd_ctx_params_t), # sd_ctx_params ], sd_ctx_t_p_ctypes, ) def new_sd_ctx( - model_path: bytes, - clip_l_path: bytes, - clip_g_path: bytes, - t5xxl_path: bytes, - diffusion_model_path: bytes, - vae_path: bytes, - taesd_path: bytes, - control_net_path: bytes, - lora_model_dir: bytes, - embed_dir: bytes, - stacked_id_embed_dir: bytes, - vae_decode_only: bool, - vae_tiling: bool, - free_params_immediately: bool, - n_threads: int, - wtype: int, # GGMLType - rng_type: int, # RNGType - s: int, # Schedule - keep_clip_on_cpu: bool, - keep_control_net_cpu: bool, - keep_vae_on_cpu: bool, - diffusion_flash_attn: bool, + sd_ctx_params: sd_ctx_params_t, /, ) -> Optional[sd_ctx_t_p]: ... -# ------------ free_sd_ctx ------------ +# ------------------------------------------- +# free_sd_ctx +# ------------------------------------------- +# SD_API void free_sd_ctx(sd_ctx_t* sd_ctx); @ctypes_function( "free_sd_ctx", - [sd_ctx_t_p_ctypes], # sd_ctx + [ + sd_ctx_t_p_ctypes, # sd_ctx + ], None, ) def free_sd_ctx( @@ -379,9 +474,12 @@ def free_sd_ctx( ): ... -# ------------ sd_image_t ------------ +# ------------------------------------------- +# sd_image_t +# ------------------------------------------- +# typedef struct { uint32_t width; uint32_t height; uint32_t channel; uint8_t* data; } sd_image_t; class sd_image_t(ctypes.Structure): _fields_ = [ ("width", ctypes.c_uint32), @@ -391,203 +489,310 @@ class sd_image_t(ctypes.Structure): ] -sd_image_t_p = ctypes.POINTER(sd_image_t) +# ------------------------------------------- +# sd_pm_params_t +# ------------------------------------------- + + +# typedef struct { sd_image_t* id_images; int id_images_count; const char* id_embed_path; float style_strength; } sd_pm_params_t; // photo maker +class sd_pm_params_t(ctypes.Structure): + _fields_ = [ + ("id_images", ctypes.POINTER(sd_image_t)), + ("id_images_count", ctypes.c_int), + ("id_embed_path", ctypes.c_char_p), + ("style_strength", ctypes.c_float), + ] # photo maker + +# ------------------------------------------- +# sd_tiling_params_t +# ------------------------------------------- + + +# typedef struct { bool enabled; int tile_size_x; int tile_size_y; float target_overlap; float rel_size_x; float rel_size_y; } sd_tiling_params_t; +class sd_tiling_params_t(ctypes.Structure): + _fields_ = [ + ("enabled", ctypes.c_bool), + ("tile_size_x", ctypes.c_int), + ("tile_size_y", ctypes.c_int), + ("target_overlap", ctypes.c_float), + ("rel_size_x", ctypes.c_float), + ("rel_size_y", ctypes.c_float), + ] -# ------------ txt2img ------------ +# ------------------------------------------- +# sd_slg_params_t +# ------------------------------------------- -# SD_API sd_image_t* txt2img(sd_ctx_t* sd_ctx, const char* prompt, const char* negative_prompt, int clip_skip, float cfg_scale, float guidance, float eta, int width, int height, enum sample_method_t sample_method, int sample_steps, int64_t seed, int batch_count, const sd_image_t* control_cond, float control_strength, float style_strength, bool normalize_input, const char* input_id_images_path, int* skip_layers, size_t skip_layers_count, float slg_scale, float skip_layer_start, float skip_layer_end); + +# typedef struct { int* layers; size_t layer_count; float layer_start; float layer_end; float scale; } sd_slg_params_t; +class sd_slg_params_t(ctypes.Structure): + _fields_ = [ + ("layers", ctypes.POINTER(ctypes.c_int)), + ("layer_count", ctypes.c_size_t), + ("layer_start", ctypes.c_float), + ("layer_end", ctypes.c_float), + ("scale", ctypes.c_float), + ] + + +# ------------------------------------------- +# sd_guidance_params_t +# ------------------------------------------- + + +# typedef struct { float txt_cfg; float img_cfg; float distilled_guidance; sd_slg_params_t slg; } sd_guidance_params_t; +class sd_guidance_params_t(ctypes.Structure): + _fields_ = [ + ("txt_cfg", ctypes.c_float), + ("img_cfg", ctypes.c_float), + ("distilled_guidance", ctypes.c_float), + ("slg", sd_slg_params_t), + ] + + +# ------------------------------------------- +# sd_sample_params_t +# ------------------------------------------- + + +# typedef struct { sd_guidance_params_t guidance; enum scheduler_t scheduler; enum sample_method_t sample_method; int sample_steps; float eta; int shifted_timestep; float* custom_sigmas; int custom_sigmas_count; } sd_sample_params_t; +class sd_sample_params_t(ctypes.Structure): + _fields_ = [ + ("guidance", sd_guidance_params_t), + ("scheduler", ctypes.c_int), # Scheduler + ("sample_method", ctypes.c_int), # SampleMethod + ("sample_steps", ctypes.c_int), + ("eta", ctypes.c_float), + ("shifted_timestep", ctypes.c_int), + ("custom_sigmas", ctypes.POINTER(ctypes.c_float)), + ("custom_sigmas_count", ctypes.c_int), + ] + + +# ------------------------------------------- +# sd_easycache_params_t +# ------------------------------------------- + + +# typedef struct { bool enabled; float reuse_threshold; float start_percent; float end_percent; } sd_easycache_params_t; +class sd_easycache_params_t(ctypes.Structure): + _fields_ = [ + ("enabled", ctypes.c_bool), + ("reuse_threshold", ctypes.c_float), + ("start_percent", ctypes.c_float), + ("end_percent", ctypes.c_float), + ] + + +# ------------------------------------------- +# sd_lora_t +# ------------------------------------------- + + +# typedef struct { bool is_high_noise; float multiplier; const char* path; } sd_lora_t; +class sd_lora_t(ctypes.Structure): + _fields_ = [ + ("is_high_noise", ctypes.c_bool), + ("multiplier", ctypes.c_float), + ("path", ctypes.c_char_p), + ] + + +# ------------------------------------------- +# sd_img_gen_params_t +# ------------------------------------------- + + +# typedef struct { const sd_lora_t* loras; uint32_t lora_count; const char* prompt; const char* negative_prompt; int clip_skip; sd_image_t init_image; sd_image_t* ref_images; int ref_images_count; bool auto_resize_ref_image; bool increase_ref_index; sd_image_t mask_image; int width; int height; sd_sample_params_t sample_params; float strength; int64_t seed; int batch_count; sd_image_t control_image; float control_strength; sd_pm_params_t pm_params; sd_tiling_params_t vae_tiling_params; sd_easycache_params_t easycache; } sd_img_gen_params_t; +class sd_img_gen_params_t(ctypes.Structure): + _fields_ = [ + ("loras", ctypes.POINTER(sd_lora_t)), + ("lora_count", ctypes.c_uint32), + ("prompt", ctypes.c_char_p), + ("negative_prompt", ctypes.c_char_p), + ("clip_skip", ctypes.c_int), + ("init_image", sd_image_t), + ("ref_images", ctypes.POINTER(sd_image_t)), + ("ref_images_count", ctypes.c_int), + ("auto_resize_ref_image", ctypes.c_bool), + ("increase_ref_index", ctypes.c_bool), + ("mask_image", sd_image_t), + ("width", ctypes.c_int), + ("height", ctypes.c_int), + ("sample_params", sd_sample_params_t), + ("strength", ctypes.c_float), + ("seed", ctypes.c_int64), + ("batch_count", ctypes.c_int), + ("control_image", sd_image_t), + ("control_strength", ctypes.c_float), + ("pm_params", sd_pm_params_t), + ("vae_tiling_params", sd_tiling_params_t), + ("easycache", sd_easycache_params_t), + ] + + +# ------------------------------------------- +# generate_image +# ------------------------------------------- + + +# SD_API sd_image_t* generate_image(sd_ctx_t* sd_ctx, const sd_img_gen_params_t* sd_img_gen_params); @ctypes_function( - "txt2img", + "generate_image", [ sd_ctx_t_p_ctypes, # sd_ctx - ctypes.c_char_p, # prompt - ctypes.c_char_p, # negative_prompt - ctypes.c_int, # clip_skip - ctypes.c_float, # cfg_scale - ctypes.c_float, # guidance - ctypes.c_float, # eta - ctypes.c_int, # width - ctypes.c_int, # height - ctypes.c_int, # sample_method - ctypes.c_int, # sample_steps - ctypes.c_int64, # seed - ctypes.c_int, # batch_count - sd_image_t_p, # control_cond - ctypes.c_float, # control_strength - ctypes.c_float, # style_strength - ctypes.c_bool, # normalize_input - ctypes.c_char_p, # input_id_images_path - ctypes.POINTER(ctypes.c_int), # skip_layers - ctypes.c_size_t, # skip_layers_count - ctypes.c_float, # slg_scale - ctypes.c_float, # skip_layer_start - ctypes.c_float, # skip_layer_end + ctypes.POINTER(sd_img_gen_params_t), # sd_img_gen_params ], - sd_image_t_p, + ctypes.POINTER(sd_image_t), ) -def txt2img( +def generate_image( sd_ctx: sd_ctx_t_p, - prompt: bytes, - negative_prompt: bytes, - clip_skip: int, - cfg_scale: float, - guidance: float, - eta: float, - width: int, - height: int, - sample_method: int, # SampleMethod - sample_steps: int, - seed: int, - batch_count: int, - control_cond: sd_image_t, - control_strength: float, - style_strength: float, - normalize_input: bool, - input_id_images_path: bytes, - skip_layers: List[int], - skip_layers_count: int, - slg_scale: float, - skip_layer_start: float, - skip_layer_end: float, + sd_img_gen_params: sd_img_gen_params_t, /, ) -> CtypesArray[sd_image_t]: ... -# ------------ img2img ------------ +# ------------------------------------------- +# sd_vid_gen_params_t +# ------------------------------------------- + + +# typedef struct { const sd_lora_t* loras; uint32_t lora_count; const char* prompt; const char* negative_prompt; int clip_skip; sd_image_t init_image; sd_image_t end_image; sd_image_t* control_frames; int control_frames_size; int width; int height; sd_sample_params_t sample_params; sd_sample_params_t high_noise_sample_params; float moe_boundary; float strength; int64_t seed; int video_frames; float vace_strength; sd_easycache_params_t easycache; } sd_vid_gen_params_t; +class sd_vid_gen_params_t(ctypes.Structure): + _fields_ = [ + ("loras", ctypes.POINTER(sd_lora_t)), + ("lora_count", ctypes.c_uint32), + ("prompt", ctypes.c_char_p), + ("negative_prompt", ctypes.c_char_p), + ("clip_skip", ctypes.c_int), + ("init_image", sd_image_t), + ("end_image", sd_image_t), + ("control_frames", ctypes.POINTER(sd_image_t)), + ("control_frames_size", ctypes.c_int), + ("width", ctypes.c_int), + ("height", ctypes.c_int), + ("sample_params", sd_sample_params_t), + ("high_noise_sample_params", sd_sample_params_t), + ("moe_boundary", ctypes.c_float), + ("strength", ctypes.c_float), + ("seed", ctypes.c_int64), + ("video_frames", ctypes.c_int), + ("vace_strength", ctypes.c_float), + ("easycache", sd_easycache_params_t), + ] + + +# ------------------------------------------- +# generate_video +# ------------------------------------------- -# SD_API sd_image_t* img2img(sd_ctx_t* sd_ctx, sd_image_t init_image, sd_image_t mask_image, const char* prompt, const char* negative_prompt, int clip_skip, float cfg_scale, float guidance, float eta, int width, int height, enum sample_method_t sample_method, int sample_steps, float strength, int64_t seed, int batch_count, const sd_image_t* control_cond, float control_strength, float style_strength, bool normalize_input, const char* input_id_images_path, int* skip_layers, size_t skip_layers_count, float slg_scale, float skip_layer_start, float skip_layer_end); +num_frames_out_p = NewType("num_frames_out_p", int) + + +# SD_API sd_image_t* generate_video(sd_ctx_t* sd_ctx, const sd_vid_gen_params_t* sd_vid_gen_params, int* num_frames_out); @ctypes_function( - "img2img", + "generate_video", [ sd_ctx_t_p_ctypes, # sd_ctx - sd_image_t, # init_image - sd_image_t, # mask_image - ctypes.c_char_p, # prompt - ctypes.c_char_p, # negative_prompt - ctypes.c_int, # clip_skip - ctypes.c_float, # cfg_scale - ctypes.c_float, # guidance - ctypes.c_float, # eta - ctypes.c_int, # width - ctypes.c_int, # height - ctypes.c_int, # sample_method - ctypes.c_int, # sample_steps - ctypes.c_float, # strength - ctypes.c_int64, # seed - ctypes.c_int, # batch_count - sd_image_t_p, # control_cond - ctypes.c_float, # control_strength - ctypes.c_float, # style_strength - ctypes.c_bool, # normalize_input - ctypes.c_char_p, # input_id_images_path - ctypes.POINTER(ctypes.c_int), # skip_layers - ctypes.c_size_t, # skip_layers_count - ctypes.c_float, # slg_scale - ctypes.c_float, # skip_layer_start - ctypes.c_float, # skip_layer_end + ctypes.POINTER(sd_vid_gen_params_t), # sd_vid_gen_params + ctypes.POINTER(ctypes.c_int), # num_frames_out ], - sd_image_t_p, + ctypes.POINTER(sd_image_t), ) -def img2img( +def generate_video( sd_ctx: sd_ctx_t_p, - init_image: sd_image_t, - mask_image: sd_image_t, - prompt: bytes, - negative_prompt: bytes, - clip_skip: int, - cfg_scale: float, - guidance: float, - eta: float, - width: int, - height: int, - sample_method: int, # SampleMethod - sample_steps: int, - strength: float, - seed: int, - batch_count: int, - control_cond: sd_image_t, - control_strength: float, - style_strength: float, - normalize_input: bool, - input_id_images_path: bytes, - skip_layers: List[int], - skip_layers_count: int, - slg_scale: float, - skip_layer_start: float, - skip_layer_end: float, + sd_vid_gen_params: sd_vid_gen_params_t, + num_frames_out: num_frames_out_p, /, ) -> CtypesArray[sd_image_t]: ... -# ------------ img2vid ------------ +# ------------------------------------------- +# sd_get_default_sample_method +# ------------------------------------------- -# SD_API sd_image_t* img2vid(sd_ctx_t* sd_ctx, sd_image_t init_image, int width, int height, int video_frames, int motion_bucket_id, int fps, float augmentation_level, float min_cfg, float cfg_scale, enum sample_method_t sample_method, int sample_steps, float strength, int64_t seed); +# SD_API enum sample_method_t sd_get_default_sample_method(const sd_ctx_t* sd_ctx); @ctypes_function( - "img2vid", + "sd_get_default_sample_method", [ sd_ctx_t_p_ctypes, # sd_ctx - sd_image_t, # init_image - ctypes.c_int, # width - ctypes.c_int, # height - ctypes.c_int, # video_frames - ctypes.c_int, # motion_bucket_id - ctypes.c_int, # fps - ctypes.c_float, # augmentation_level - ctypes.c_float, # min_cfg - ctypes.c_float, # cfg_scale - ctypes.c_int, # sample_method - ctypes.c_int, # sample_steps - ctypes.c_float, # strength - ctypes.c_int64, # seed ], - sd_image_t_p, + ctypes.c_int, # SampleMethod ) -def img2vid( +def sd_get_default_sample_method( sd_ctx: sd_ctx_t_p, - init_image: sd_image_t, - width: int, - height: int, - video_frames: int, - motion_bucket_id: int, - fps: int, - augmentation_level: float, - min_cfg: float, - cfg_scale: float, - sample_method: int, # SampleMethod - sample_steps: int, - strength: float, - seed: int, /, -) -> CtypesArray[sd_image_t]: ... +) -> Optional[SampleMethod]: ... + + +# ------------------------------------------- +# sd_get_default_scheduler +# ------------------------------------------- + + +# SD_API enum scheduler_t sd_get_default_scheduler(const sd_ctx_t* sd_ctx); +@ctypes_function( + "sd_get_default_scheduler", + [ + sd_ctx_t_p_ctypes, # sd_ctx + ], + ctypes.c_int, # Scheduler +) +def sd_get_default_scheduler( + sd_ctx: sd_ctx_t_p, + /, +) -> Optional[Scheduler]: ... + +# ------------------------------------------- +# upscaler_ctx_t +# ------------------------------------------- -# ------------ new_upscaler_ctx ------------ +# typedef struct upscaler_ctx_t upscaler_ctx_t; +class upscaler_ctx_t(ctypes.Structure): + pass + + +# struct upscaler_ctx; upscaler_ctx_t_p = NewType("upscaler_ctx_t_p", int) -upscaler_ctx_t_p_ctypes = ctypes.c_void_p +upscaler_ctx_t_p_ctypes = ctypes.POINTER(upscaler_ctx_t) + +# ------------------------------------------- +# new_upscaler_ctx +# ------------------------------------------- -# SD_API upscaler_ctx_t* new_upscaler_ctx(const char* esrgan_path, int n_threads); + +# SD_API upscaler_ctx_t* new_upscaler_ctx(const char* esrgan_path, bool offload_params_to_cpu, bool direct, int n_threads, int tile_size); @ctypes_function( "new_upscaler_ctx", [ ctypes.c_char_p, # esrgan_path + ctypes.c_bool, # offload_params_to_cpu + ctypes.c_bool, # direct ctypes.c_int, # n_threads + ctypes.c_int, # tile_size ], upscaler_ctx_t_p_ctypes, ) def new_upscaler_ctx( esrgan_path: bytes, + offload_params_to_cpu: bool, + direct: bool, n_threads: int, + tile_size: int, /, ) -> upscaler_ctx_t_p: ... -# ------------ free_upscaler_ctx ------------ +# ------------------------------------------- +# free_upscaler_ctx +# ------------------------------------------- # SD_API void free_upscaler_ctx(upscaler_ctx_t* upscaler_ctx); @@ -604,7 +809,9 @@ def free_upscaler_ctx( ) -> None: ... -# ------------ upscale ------------ +# ------------------------------------------- +# upscale +# ------------------------------------------- # SD_API sd_image_t upscale(upscaler_ctx_t* upscaler_ctx, sd_image_t input_image, uint32_t upscale_factor); @@ -625,10 +832,31 @@ def upscale( ) -> sd_image_t: ... -# ------------ convert ------------ +# ------------------------------------------- +# get_upscale_factor +# ------------------------------------------- -# SD_API bool convert(const char* input_path, const char* vae_path, const char* output_path, sd_type_t output_type); +# SD_API int get_upscale_factor(upscaler_ctx_t* upscaler_ctx); +@ctypes_function( + "get_upscale_factor", + [ + upscaler_ctx_t_p_ctypes, # upscaler_ctx + ], + ctypes.c_int, +) +def get_upscale_factor( + upscaler_ctx: upscaler_ctx_t_p, + /, +) -> int: ... + + +# ------------------------------------------- +# convert +# ------------------------------------------- + + +# SD_API bool convert(const char* input_path, const char* vae_path, const char* output_path, enum sd_type_t output_type, const char* tensor_type_rules); @ctypes_function( "convert", [ @@ -636,6 +864,7 @@ def upscale( ctypes.c_char_p, # vae_path ctypes.c_char_p, # output_path ctypes.c_int, # output_type + ctypes.c_char_p, # tensor_type_rules ], ctypes.c_bool, ) @@ -644,56 +873,66 @@ def convert( vae_path: bytes, output_path: bytes, output_type: int, + tensor_type_rules: bytes, /, ) -> bool: ... -# ------------ preprocess_canny ------------ +# ------------------------------------------- +# preprocess_canny +# ------------------------------------------- -# SD_API uint8_t* preprocess_canny(uint8_t* img, int width, int height, float high_threshold, float low_threshold, float weak, float strong, bool inverse); +# SD_API bool preprocess_canny(sd_image_t image, float high_threshold, float low_threshold, float weak, float strong, bool inverse); @ctypes_function( "preprocess_canny", [ - ctypes.POINTER(ctypes.c_uint8), # img - ctypes.c_int, # width - ctypes.c_int, # height + sd_image_t, # image ctypes.c_float, # high_threshold ctypes.c_float, # low_threshold ctypes.c_float, # weak ctypes.c_float, # strong ctypes.c_bool, # inverse ], - ctypes.POINTER(ctypes.c_uint8), + ctypes.c_bool, ) def preprocess_canny( - img: CtypesArray[ctypes.c_uint8], - width: int, - height: int, + image: sd_image_t, high_threshold: float, low_threshold: float, weak: float, strong: float, inverse: bool, /, -) -> CtypesArray[ctypes.c_uint8]: ... +) -> bool: ... -# ================================== +# =========================================== # System Information -# ================================== +# =========================================== +# ------------------------------------------- +# sd_get_num_physical_cores +# ------------------------------------------- + +# SD_API int32_t sd_get_num_physical_cores(); @ctypes_function( - "get_num_physical_cores", + "sd_get_num_physical_cores", [], ctypes.c_int32, ) -def get_num_physical_cores() -> int: +def sd_get_num_physical_cores() -> int: """Get the number of physical cores""" ... +# ------------------------------------------- +# sd_get_system_info +# ------------------------------------------- + + +# SD_API const char* sd_get_system_info(); @ctypes_function( "sd_get_system_info", [], @@ -704,16 +943,53 @@ def sd_get_system_info() -> bytes: ... -# ================================== +# ------------------------------------------- +# sd_commit +# ------------------------------------------- + + +# SD_API const char* sd_commit(void); +@ctypes_function( + "sd_commit", + [], + ctypes.c_char_p, +) +def sd_commit() -> bytes: + """Get the Stable diffusion commit hash""" + ... + + +# ------------------------------------------- +# sd_version +# ------------------------------------------- + + +# SD_API const char* sd_version(void); +@ctypes_function( + "sd_version", + [], + ctypes.c_char_p, +) +def sd_version() -> bytes: + """Get the Stable diffusion version string""" + ... + + +# =========================================== # Progression -# ================================== +# =========================================== +# typedef void (*sd_progress_cb_t)(int step, int steps, float time, void* data); sd_progress_callback = ctypes.CFUNCTYPE(None, ctypes.c_int, ctypes.c_int, ctypes.c_float, ctypes.c_void_p) +# SD_API void sd_set_progress_callback(sd_progress_cb_t cb, void* data); @ctypes_function( "sd_set_progress_callback", - [ctypes.c_void_p, ctypes.c_void_p], + [ + ctypes.c_void_p, # sd_progress_cb_t + ctypes.c_void_p, # data + ], None, ) def sd_set_progress_callback( @@ -725,13 +1001,50 @@ def sd_set_progress_callback( ... -# ================================== +# =========================================== +# Preview +# =========================================== + +# typedef void (*sd_preview_cb_t)(int step, int frame_count, sd_image_t* frames, bool is_noisy, void* data); +sd_preview_callback = ctypes.CFUNCTYPE( + None, ctypes.c_int, ctypes.c_int, ctypes.POINTER(sd_image_t), ctypes.c_bool, ctypes.c_void_p +) + + +# SD_API void sd_set_preview_callback(sd_preview_cb_t cb, enum preview_t mode, int interval, bool denoised, bool noisy, void* data); +@ctypes_function( + "sd_set_preview_callback", + [ + ctypes.c_void_p, # sd_preview_cb_t + ctypes.c_int, # mode + ctypes.c_int, # interval + ctypes.c_bool, # denoised + ctypes.c_bool, # noisy + ctypes.c_void_p, # data + ], + None, +) +def sd_set_preview_callback( + callback: Optional[CtypesFuncPointer], + mode: int, + interval: int, + denoised: bool, + noisy: bool, + data: ctypes.c_void_p, + /, +): + """Set callback for preview images during generation.""" + ... + + +# =========================================== # Logging -# ================================== +# =========================================== sd_log_callback = ctypes.CFUNCTYPE(None, ctypes.c_int, ctypes.c_char_p, ctypes.c_void_p) +# SD_API void sd_set_log_callback(sd_log_cb_t sd_log_cb, void* data); @ctypes_function( "sd_set_log_callback", [ctypes.c_void_p, ctypes.c_void_p], diff --git a/tests/conftest.py b/tests/conftest.py new file mode 100644 index 0000000..563baa3 --- /dev/null +++ b/tests/conftest.py @@ -0,0 +1,45 @@ +import os +from typing import List + +import numpy as np +import ffmpeg +from PIL import Image + +OUTPUT_DIR = "tests/outputs" +if not os.path.exists(OUTPUT_DIR): + os.makedirs(OUTPUT_DIR) + +SD_CPP_CLI = "C:\\Users\\Willi\\Documents\\GitHub\\stable-diffusion.cpp\\build\\bin\\sd" + + +# =========================================== +# Video Saving +# =========================================== + + +def save_video_ffmpeg(frames: List[Image.Image], fps: int, out_path: str) -> None: + if not frames: + raise ValueError("No frames provided") + + width, height = frames[0].size + + # Concatenate frames into raw RGB bytes + raw_bytes = b"".join(np.array(frame.convert("RGB"), dtype=np.uint8).tobytes() for frame in frames) + ( + ffmpeg.input( + "pipe:", + format="rawvideo", + pix_fmt="rgb24", + s=f"{width}x{height}", + r=fps, + ) + .output( + out_path, + vcodec="libx264", + pix_fmt="yuv420p", + r=fps, + movflags="+faststart", + ) + .overwrite_output() + .run(input=raw_bytes) + ) diff --git a/tests/test_chroma.py b/tests/test_chroma.py new file mode 100644 index 0000000..50acd93 --- /dev/null +++ b/tests/test_chroma.py @@ -0,0 +1,75 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\flux-chroma\\Chroma1-HD-Flash-Q4_0.gguf" +T5XXL_PATH = "F:\\stable-diffusion\\flux\\t5xxl_q8_0.gguf" +VAE_PATH = "F:\\stable-diffusion\\flux\\ae-f16.gguf" + + +PROMPT = "a lovely cat holding a sign says 'chroma.cpp'" +STEPS = 4 +CFG_SCALE = 4.0 + + +def test_chroma(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + t5xxl_path=T5XXL_PATH, + vae_path=VAE_PATH, + keep_clip_on_cpu=True, + vae_decode_only=True, + chroma_use_dit_mask=False, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/chroma.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--t5xxl", +# T5XXL_PATH, +# "--prompt", +# PROMPT, +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--clip-on-cpu", +# "--chroma-disable-dit-mask", +# "--output", +# f"{OUTPUT_DIR}/chroma_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_controlnet.py b/tests/test_controlnet.py index b9374b8..51bc863 100644 --- a/tests/test_controlnet.py +++ b/tests/test_controlnet.py @@ -1,43 +1,73 @@ -import os -import traceback +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -MODEL_PATH = "C:\\stable-diffusion\\v1-5-pruned-emaonly.safetensors" -CONTROLNET_MODEL_PATH = "C:\\stable-diffusion\\control_nets\\control_openpose-fp16.safetensors" +MODEL_PATH = "F:\\stable-diffusion\\v1-5-pruned-emaonly.safetensors" +CONTROLNET_MODEL_PATH = "F:\\stable-diffusion\\control_nets\\control_openpose-fp16.safetensors" -INPUT_IMAGE_PATH = "assets\\input.png" -stable_diffusion = StableDiffusion(model_path=MODEL_PATH, control_net_path=CONTROLNET_MODEL_PATH) +INPUT_IMAGE_PATH = "assets\\input.png" +PROMPTS = [ + {"add": "", "prompt": "a lovely cat", "canny": False}, + {"add": "_canny", "prompt": "a lovely cat", "canny": True}, +] -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) +def test_controlnet(): + stable_diffusion = StableDiffusion( + model_path=MODEL_PATH, + control_net_path=CONTROLNET_MODEL_PATH, + ) -try: - prompts = [ - {"add": "", "prompt": "a lovely cat", "canny": False}, - {"add": "_canny", "prompt": "a lovely cat", "canny": True}, - ] + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) - for prompt in prompts: - # Generate images - images = stable_diffusion.txt_to_img( + for prompt in PROMPTS: + # Generate image + image = stable_diffusion.generate_image( prompt=prompt["prompt"], - control_cond=INPUT_IMAGE_PATH, + control_image=INPUT_IMAGE_PATH, canny=prompt["canny"], - progress_callback=callback, - ) + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/controlnet{prompt['add']}.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) +# for prompt in PROMPTS: +# cli_cmd = [ +# SD_CPP_CLI, +# "--model", +# MODEL_PATH, +# "--control-net", +# CONTROLNET_MODEL_PATH, +# "--prompt", +# prompt["prompt"], +# "--control-image", +# INPUT_IMAGE_PATH, +# "--canny" if prompt["canny"] else None, +# "--output", +# f"{OUTPUT_DIR}/controlnet{prompt['add']}_cli.png", +# "-v", +# ] - # Save images - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/controlnet{prompt['add']}_{i}.png") +# # Remove None values +# cli_cmd = [arg for arg in cli_cmd if arg is not None] -except Exception as e: - traceback.print_exc() - print("Test - controlnet failed: ", e) +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_convert_model.py b/tests/test_convert_model.py index a6c7db6..406f26e 100644 --- a/tests/test_convert_model.py +++ b/tests/test_convert_model.py @@ -1,25 +1,17 @@ -import os -import traceback +from conftest import OUTPUT_DIR -import stable_diffusion_cpp.stable_diffusion_cpp as sd_cpp +from stable_diffusion_cpp import StableDiffusion -MODEL_PATH = "C:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" +MODEL_PATH = "F:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" -OUTPUT_MODEL_DIR = "tests\\outputs" -OUTPUT_MODEL_PATH = f"{OUTPUT_MODEL_DIR}\\new_model.gguf" -try: - if not os.path.exists(OUTPUT_MODEL_DIR): - os.makedirs(OUTPUT_MODEL_DIR) +def test_convert_model(): - model_converted = sd_cpp.convert( - MODEL_PATH.encode("utf-8"), - "".encode("utf-8"), - OUTPUT_MODEL_PATH.encode("utf-8"), - sd_cpp.GGMLType.SD_TYPE_Q8_0, + stable_diffusion = StableDiffusion() + + model_converted = stable_diffusion.convert( + input_path=MODEL_PATH, + output_path=f"{OUTPUT_DIR}/convert_model.gguf", + output_type="q8_0", ) print("Model converted: ", model_converted) - -except Exception as e: - traceback.print_exc() - print("Test - convert_model failed: ", e) diff --git a/tests/test_edit.py b/tests/test_edit.py new file mode 100644 index 0000000..f37c726 --- /dev/null +++ b/tests/test_edit.py @@ -0,0 +1,92 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\flux-kontext\\flux1-kontext-dev-Q3_K_S.gguf" +T5XXL_PATH = "F:\\stable-diffusion\\flux\\t5xxl_q8_0.gguf" +CLIP_L_PATH = "F:\\stable-diffusion\\flux\\clip_l-q8_0.gguf" +VAE_PATH = "F:\\stable-diffusion\\flux\\ae-f16.gguf" + + +INPUT_IMAGE_PATHS = [ + "assets\\input.png", + # "assets\\box.png", +] + +PROMPT = "make the cat blue" +STEPS = 4 +CFG_SCALE = 1.0 +IMAGE_CFG_SCALE = 1.0 +SAMPLE_METHOD = "euler" + + +def test_edit(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + clip_l_path=CLIP_L_PATH, + t5xxl_path=T5XXL_PATH, + vae_path=VAE_PATH, + keep_clip_on_cpu=True, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Edit image + image = stable_diffusion.generate_image( + prompt=PROMPT, + ref_images=INPUT_IMAGE_PATHS, + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + image_cfg_scale=IMAGE_CFG_SCALE, + sample_method=SAMPLE_METHOD, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/edit.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--t5xxl", +# T5XXL_PATH, +# "--clip_l", +# CLIP_L_PATH, +# "--prompt", +# PROMPT, +# "--ref-image", +# ",".join(INPUT_IMAGE_PATHS), +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--img-cfg-scale", +# str(IMAGE_CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--clip-on-cpu", +# "--output", +# f"{OUTPUT_DIR}/edit_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_flex2.py b/tests/test_flex2.py new file mode 100644 index 0000000..07758f7 --- /dev/null +++ b/tests/test_flex2.py @@ -0,0 +1,77 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\flex\\Flex.2-preview-Q8_0.gguf" +T5XXL_PATH = "F:\\stable-diffusion\\flux\\t5xxl_q8_0.gguf" +CLIP_L_PATH = "F:\\stable-diffusion\\flux\\clip_l-q8_0.gguf" +VAE_PATH = "F:\\stable-diffusion\\flux\\ae-f16.gguf" + +INPUT_IMAGE_PATH = "assets\\input.png" +PROMPT = "the cat has a hat" +STEPS = 20 + + +def test_flex2(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + clip_l_path=CLIP_L_PATH, + t5xxl_path=T5XXL_PATH, + vae_path=VAE_PATH, + keep_clip_on_cpu=True, + vae_decode_only=True, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + control_image=INPUT_IMAGE_PATH, + sample_steps=STEPS, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/flex2.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--control-image", +# INPUT_IMAGE_PATH, +# "--vae", +# VAE_PATH, +# "--t5xxl", +# T5XXL_PATH, +# "--clip_l", +# CLIP_L_PATH, +# "--prompt", +# PROMPT, +# "--steps", +# str(STEPS), +# "--clip-on-cpu", +# "--output", +# f"{OUTPUT_DIR}/flex2_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_flux.py b/tests/test_flux.py index 7811eef..2baa01c 100644 --- a/tests/test_flux.py +++ b/tests/test_flux.py @@ -1,57 +1,88 @@ -import os -import traceback +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -DIFFUSION_MODEL_PATH = "C:\\stable-diffusion\\flux\\flux1-schnell-q3_k.gguf" -# DIFFUSION_MODEL_PATH = "C:\\stable-diffusion\\flux\\flux1-dev-q8_0.gguf" -T5XXL_PATH = "C:\\stable-diffusion\\flux\\t5xxl_q8_0.gguf" -CLIP_L_PATH = "C:\\stable-diffusion\\flux\\clip_l-q8_0.gguf" -VAE_PATH = "C:\\stable-diffusion\\flux\\ae-f16.gguf" +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\flux\\flux1-schnell-q3_k.gguf" +# DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\flux\\flux1-dev-q8_0.gguf" +T5XXL_PATH = "F:\\stable-diffusion\\flux\\t5xxl_q8_0.gguf" +CLIP_L_PATH = "F:\\stable-diffusion\\flux\\clip_l-q8_0.gguf" +VAE_PATH = "F:\\stable-diffusion\\flux\\ae-f16.gguf" +LORA_DIR = "F:\\stable-diffusion\\loras" -LORA_DIR = "C:\\stable-diffusion\\loras" -stable_diffusion = StableDiffusion( - diffusion_model_path=DIFFUSION_MODEL_PATH, - clip_l_path=CLIP_L_PATH, - t5xxl_path=T5XXL_PATH, - vae_path=VAE_PATH, - lora_model_dir=LORA_DIR, - vae_decode_only=True, -) +PROMPTS = [ + # { + # "add": "_lora", + # "prompt": "a lovely cat holding a sign says 'flux.cpp' ", + # }, # With LORA + {"add": "", "prompt": "a lovely cat holding a sign says 'flux.cpp'"}, # Without LORA +] +STEPS = 4 +CFG_SCALE = 1.0 -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) +def test_flux(): + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + clip_l_path=CLIP_L_PATH, + t5xxl_path=T5XXL_PATH, + vae_path=VAE_PATH, + lora_model_dir=LORA_DIR, + keep_clip_on_cpu=True, + vae_decode_only=True, + ) -try: - prompts = [ - # { - # "add": "_lora", - # "prompt": "a lovely cat holding a sign says 'flux.cpp' ", - # }, # With LORA - {"add": "", "prompt": "a lovely cat holding a sign says 'flux.cpp'"}, # Without LORA - ] + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) - - for prompt in prompts: - # Generate images - images = stable_diffusion.txt_to_img( + for prompt in PROMPTS: + # Generate image + image = stable_diffusion.generate_image( prompt=prompt["prompt"], - sample_steps=4, - cfg_scale=1.0, - sample_method="euler", - progress_callback=callback, - ) - - # Save images - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/flux{prompt['add']}_{i}.png") - -except Exception as e: - traceback.print_exc() - print("Test - flux failed: ", e) + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/flux{prompt['add']}.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + +# for prompt in PROMPTS: +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--t5xxl", +# T5XXL_PATH, +# "--clip_l", +# CLIP_L_PATH, +# "--prompt", +# prompt["prompt"], +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--clip-on-cpu", +# "--output", +# f"{OUTPUT_DIR}/flux{prompt['add']}_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_flux2.py b/tests/test_flux2.py new file mode 100644 index 0000000..ce9e0a7 --- /dev/null +++ b/tests/test_flux2.py @@ -0,0 +1,80 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\flux2\\flux2-dev-Q2_K.gguf" +LLM_PATH = "F:\\stable-diffusion\\flux2\\Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf" +VAE_PATH = "F:\\stable-diffusion\\flux2\\ae.safetensors" + +INPUT_IMAGE_PATHS = [ + "assets\\input.png", +] + +PROMPT = "the cat has a hat" +STEPS = 4 +CFG_SCALE = 1.0 + + +def test_flux2(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + llm_path=LLM_PATH, + vae_path=VAE_PATH, + # vae_decode_only=True, + offload_params_to_cpu=True, + # diffusion_flash_attn=True, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + ref_images=INPUT_IMAGE_PATHS, + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/flux2.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--llm", +# LLM_PATH, +# "--vae", +# VAE_PATH, +# "--prompt", +# PROMPT, +# "--ref-image", +# ",".join(INPUT_IMAGE_PATHS), +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--offload-to-cpu", +# "--output", +# f"{OUTPUT_DIR}/flux2_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_img2img.py b/tests/test_img2img.py index 0d2f26d..0ab4298 100644 --- a/tests/test_img2img.py +++ b/tests/test_img2img.py @@ -1,36 +1,61 @@ -import os -import traceback +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -MODEL_PATH = "C:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" +MODEL_PATH = "F:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" + INPUT_IMAGE_PATH = "assets\\input.png" -stable_diffusion = StableDiffusion(model_path=MODEL_PATH) +PROMPT = "blue eyes" +STRENGTH = 0.4 + + +def test_img2img(): + + stable_diffusion = StableDiffusion(model_path=MODEL_PATH) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + init_image=INPUT_IMAGE_PATH, + strength=STRENGTH, + progress_callback=progress_callback, + )[0] + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/img2img.png", pnginfo=pnginfo) -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) +# =========================================== +# C++ CLI +# =========================================== -try: - # Generate images - images = stable_diffusion.img_to_img( - prompt="blue eyes", - image=INPUT_IMAGE_PATH, - strength=0.4, - progress_callback=callback, - ) +# import subprocess - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) +# from conftest import SD_CPP_CLI - # Save images - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/img2img_{i}.png") +# stable_diffusion = None # Clear model -except Exception as e: - traceback.print_exc() - print("Test - img2img failed: ", e) +# cli_cmd = [ +# SD_CPP_CLI, +# "--model", +# MODEL_PATH, +# "--prompt", +# PROMPT, +# "--init-img", +# INPUT_IMAGE_PATH, +# "--strength", +# str(STRENGTH), +# "--output", +# f"{OUTPUT_DIR}/img2img_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_img2vid.py b/tests/test_img2vid.py deleted file mode 100644 index 236083c..0000000 --- a/tests/test_img2vid.py +++ /dev/null @@ -1,31 +0,0 @@ -import os -import traceback - -from stable_diffusion_cpp import StableDiffusion - -MODEL_PATH = "C:\\stable-diffusion\\svd_xt.safetensors" - -input_image = "assets\\input.png" - -stable_diffusion = StableDiffusion(model_path=MODEL_PATH) - - -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) - -try: - images = stable_diffusion.img_to_vid(image=input_image, progress_callback=callback) - - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) - - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/img2vid_{i}.png") - -except Exception as e: - traceback.print_exc() - print("Test - img2vid failed: ", e) - - -# !!!!!!!!!!!!!!!!!! NOT WORKING (waiting for support in https://github.com/leejet/stable-diffusion.cpp ) !!!!!!!!!!!!!!!!!! diff --git a/tests/test_inpainting.py b/tests/test_inpainting.py index ae77e44..b7dcbbe 100644 --- a/tests/test_inpainting.py +++ b/tests/test_inpainting.py @@ -1,38 +1,65 @@ -import os -import traceback +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -MODEL_PATH = "C:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" +MODEL_PATH = "F:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" + INPUT_IMAGE_PATH = "assets\\input.png" MASK_IMAGE_PATH = "assets\\mask.png" -stable_diffusion = StableDiffusion(model_path=MODEL_PATH) +PROMPT = "blue eyes" +STRENGTH = 0.4 + +def test_inpainting(): -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) + stable_diffusion = StableDiffusion(model_path=MODEL_PATH) + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) -try: - # Generate images - images = stable_diffusion.img_to_img( - prompt="blue eyes", - image=INPUT_IMAGE_PATH, + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + init_image=INPUT_IMAGE_PATH, mask_image=MASK_IMAGE_PATH, - strength=0.4, - progress_callback=callback, - ) + strength=STRENGTH, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/inpainting.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) +# from conftest import SD_CPP_CLI - # Save images - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/inpainting_{i}.png") +# stable_diffusion = None # Clear model -except Exception as e: - traceback.print_exc() - print("Test - inpainting failed: ", e) +# cli_cmd = [ +# SD_CPP_CLI, +# "--model", +# MODEL_PATH, +# "--prompt", +# PROMPT, +# "--init-img", +# INPUT_IMAGE_PATH, +# "--strength", +# str(STRENGTH), +# "--mask", +# MASK_IMAGE_PATH, +# "--output", +# f"{OUTPUT_DIR}/inpainting_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_list_maps.py b/tests/test_list_maps.py index c1e5e04..9ab0ecd 100644 --- a/tests/test_list_maps.py +++ b/tests/test_list_maps.py @@ -1,6 +1,29 @@ -from stable_diffusion_cpp import GGML_TYPE_MAP, RNG_TYPE_MAP, SCHEDULE_MAP, SAMPLE_METHOD_MAP +from conftest import OUTPUT_DIR -print("GGML model types:", list(GGML_TYPE_MAP)) -print("RNG types:", list(RNG_TYPE_MAP)) -print("Schedulers:", list(SCHEDULE_MAP)) -print("Sample methods:", list(SAMPLE_METHOD_MAP)) +from stable_diffusion_cpp import ( + PREVIEW_MAP, + RNG_TYPE_MAP, + GGML_TYPE_MAP, + SCHEDULER_MAP, + PREDICTION_MAP, + SAMPLE_METHOD_MAP, + LORA_APPLY_MODE_MAP +) + + +def test_list_maps(): + maps = { + "GGML model types": GGML_TYPE_MAP, + "RNG types": RNG_TYPE_MAP, + "Schedulers": SCHEDULER_MAP, + "Sample methods": SAMPLE_METHOD_MAP, + "Prediction types": PREDICTION_MAP, + "Preview methods": PREVIEW_MAP, + "LoRA apply modes": LORA_APPLY_MODE_MAP, + } + + with open(f"{OUTPUT_DIR}/list_maps.txt", "w") as f: + for name, mapping in maps.items(): + items = list(mapping) + print(f"{name}: {items}") + f.write(f"{name}: {items}\n") diff --git a/tests/test_memory_leak.py b/tests/test_memory_leak.py new file mode 100644 index 0000000..a2afbc6 --- /dev/null +++ b/tests/test_memory_leak.py @@ -0,0 +1,83 @@ +import os +import time +import subprocess + +from stable_diffusion_cpp import StableDiffusion + + +def get_amd_vram_ps() -> float: + """Return current VRAM usage (MB) for this process using PowerShell.""" + pid = os.getpid() + ps_script = f""" + $p = Get-Process -Id {pid} + $mem = (Get-Counter "\\GPU Process Memory(pid_$($p.Id)*)\\Local Usage").CounterSamples | + Where-Object {{ $_.CookedValue -gt 0 }} | + Select-Object -ExpandProperty CookedValue + if ($mem) {{ [math]::Round($mem / 1MB, 2) }} else {{ 0 }} + """ + try: + result = subprocess.check_output(["powershell", "-Command", ps_script], stderr=subprocess.STDOUT, text=True).strip() + return float(result) + except Exception as e: + print("VRAM read error:", e) + return -1.0 + + +def log_vram(label: str, baseline: float = None) -> float: + """Log VRAM at a label and optionally show difference from baseline.""" + vram = get_amd_vram_ps() + if baseline is not None and vram >= 0: + diff = round(vram - baseline, 2) + print(f"[{label}] VRAM: {vram} MB ({diff} MB from start)") + else: + print(f"[{label}] VRAM: {vram} MB") + return vram + + +MODEL_PATH = "F:\\stable-diffusion\\juggernautXL_V8+RDiffusion.safetensors" +VAE_PATH = "F:\\stable-diffusion\\vaes\\sdxl_vae.safetensors" + + +def generate_cat(): + sd = StableDiffusion( + model_path=MODEL_PATH, + vae_path=VAE_PATH, + keep_vae_on_cpu=True, + ) + img = sd.generate_image( + prompt="a lovely cat", + sample_steps=4, + )[0] + return sd, img + + +# =========================================== +# Start Test +# =========================================== + + +def test_memory_leak(): + start_vram = log_vram("Start") + + # First load & generate + sd, img = generate_cat() + + # Unload + sd = None + time.sleep(3) + after_first_unload = log_vram("After First Unload", baseline=start_vram) + + # Second load & generate + sd, img = generate_cat() + + # Final unload + sd = None + time.sleep(3) + after_final_unload = log_vram("After Final Unload", baseline=start_vram) + + # Leak detection + if after_final_unload != after_first_unload: + leak = round(after_final_unload - after_first_unload, 2) + raise Exception(f"Possible VRAM leak detected ({leak} MB)") + else: + print("No VRAM leak detected") diff --git a/tests/test_multi_gpu.py b/tests/test_multi_gpu.py new file mode 100644 index 0000000..4dcdbe8 --- /dev/null +++ b/tests/test_multi_gpu.py @@ -0,0 +1,45 @@ +import os +import multiprocessing + +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +MODEL_PATH = "F:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" + +PROMPT = "a cute cat" +STEPS = 20 + + +def generate_on_gfx(hip_visible_devices: int, hsa_override_gfx_version: str): + os.environ["HIP_VISIBLE_DEVICES"] = str(hip_visible_devices) + os.environ["HSA_OVERRIDE_GFX_VERSION"] = hsa_override_gfx_version + + stable_diffusion = StableDiffusion(model_path=MODEL_PATH) + + def progress_callback(step: int, steps: int, time: float): + print("{} - Completed step: {} of {}".format(hsa_override_gfx_version, step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + sample_steps=STEPS, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/multi_gpu_{hsa_override_gfx_version}.png", pnginfo=pnginfo) + + +def test_multi_gpu(): + + jobs = [ + (0, "10.3.0"), # 6800XT + (1, "11.0.1"), # 7800XT + ] + + with multiprocessing.Pool(len(jobs)) as pool: + pool.starmap(generate_on_gfx, jobs) diff --git a/tests/test_ovis.py b/tests/test_ovis.py new file mode 100644 index 0000000..c6ddd49 --- /dev/null +++ b/tests/test_ovis.py @@ -0,0 +1,71 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\ovis\\ovis_image-Q4_0.gguf" +LLM_PATH = "F:\\stable-diffusion\\ovis\\ovis_2.5.safetensors" +VAE_PATH = "F:\\stable-diffusion\\ovis\\ae.safetensors" + +PROMPT = "a lovely cat" +STEPS = 20 +CFG_SCALE = 5.0 + + +def test_ovis(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + llm_path=LLM_PATH, + vae_path=VAE_PATH, + diffusion_flash_attn=True, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/ovis.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--llm", +# LLM_PATH, +# "--vae", +# VAE_PATH, +# "--prompt", +# PROMPT, +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--diffusion-fa", +# "--output", +# f"{OUTPUT_DIR}/ovis_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_photomaker.py b/tests/test_photomaker.py index 50757f5..859d881 100644 --- a/tests/test_photomaker.py +++ b/tests/test_photomaker.py @@ -1,58 +1,99 @@ import os -import traceback + +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -MODEL_PATH = "C:\\stable-diffusion\\photomaker\\sdxlUnstableDiffusers_v11.safetensors" -STACKED_ID_EMBED_DIR = "C:\\stable-diffusion\\photomaker\\photomaker-v1.safetensors" -VAE_PATH = "C:\\stable-diffusion\\photomaker\\sdxl.vae.safetensors" - -INPUT_ID_IMAGES_PATH = ".\\assets\\newton_man" - -stable_diffusion = StableDiffusion(stacked_id_embed_dir=STACKED_ID_EMBED_DIR, model_path=MODEL_PATH, vae_path=VAE_PATH) - - -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) - - -try: - # Generate images - # images = stable_diffusion.txt_to_img( - # cfg_scale=5.0, - # height=1024, - # width=1024, - # style_strength=10, # style_ratio (%) - # sample_method="euler", - # prompt="a man img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed", - # negative_prompt="realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text", - # progress_callback=callback, - # ) - - # Generate images - photomaker_images = stable_diffusion.txt_to_img( - cfg_scale=5.0, - height=1024, - width=1024, - style_strength=10, # style_ratio (%) - sample_method="euler", - prompt="a man img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed", - negative_prompt="realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text", - input_id_images_path=INPUT_ID_IMAGES_PATH, - progress_callback=callback, +MODEL_PATH = "F:\\stable-diffusion\\photomaker\\sdxlUnstableDiffusers_v11.safetensors" +PHOTO_MAKER_PATH = "F:\\stable-diffusion\\photomaker\\photomaker-v1.safetensors" +VAE_PATH = "F:\\stable-diffusion\\photomaker\\sdxl.vae.safetensors" + + +INPUT_ID_IMAGES_DIR = ".\\assets\\newton_man" + + +PROMPT = "a man img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" +NEGATIVE_PROMPT = "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" +STYLE_STRENGTH = 10 +HEIGHT = 1024 +WIDTH = 1024 +CFG_SCALE = 5.0 +SAMPLE_METHOD = "euler" + + +def test_photomaker(): + + stable_diffusion = StableDiffusion( + photo_maker_path=PHOTO_MAKER_PATH, + model_path=MODEL_PATH, + vae_path=VAE_PATH, ) - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + cfg_scale=CFG_SCALE, + height=HEIGHT, + width=WIDTH, + sample_method=SAMPLE_METHOD, + prompt=PROMPT, + negative_prompt=NEGATIVE_PROMPT, + pm_style_strength=STYLE_STRENGTH, + pm_id_images=[ + os.path.join(INPUT_ID_IMAGES_DIR, f) + for f in os.listdir(INPUT_ID_IMAGES_DIR) + if f.lower().endswith((".png", ".jpg", ".jpeg", ".bmp")) + ], + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/photomaker.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI - # # Save images - # for i, image in enumerate(images): - # image.save(f"{OUTPUT_DIR}/non_photomaker_{i}.png") +# stable_diffusion = None # Clear model - for i, image in enumerate(photomaker_images): - image.save(f"{OUTPUT_DIR}/photomaker_{i}.png") -except Exception as e: - traceback.print_exc() - print("Test - photomaker failed: ", e) +# cli_cmd = [ +# SD_CPP_CLI, +# "--photo-maker", +# PHOTO_MAKER_PATH, +# "--model", +# MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--prompt", +# PROMPT, +# "--negative-prompt", +# NEGATIVE_PROMPT, +# "--pm-style-strength", +# str(STYLE_STRENGTH), +# "--height", +# str(HEIGHT), +# "--width", +# str(WIDTH), +# "--cfg-scale", +# str(CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--pm-id-images-dir", +# INPUT_ID_IMAGES_DIR, +# "--output", +# f"{OUTPUT_DIR}/photomaker_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_photomaker_2.py b/tests/test_photomaker_2.py new file mode 100644 index 0000000..c7a3bd6 --- /dev/null +++ b/tests/test_photomaker_2.py @@ -0,0 +1,103 @@ +import os + +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +MODEL_PATH = "F:\\stable-diffusion\\photomaker\\sdxlUnstableDiffusers_v11.safetensors" +PHOTO_MAKER_PATH = "F:\\stable-diffusion\\photomaker\\photomaker-v2.safetensors" +VAE_PATH = "F:\\stable-diffusion\\photomaker\\sdxl.vae.safetensors" + + +INPUT_ID_IMAGES_DIR = ".\\assets\\newton_man" +INPUT_ID_EMBED_PATH = ".\\assets\\newton_man\\id_embeds.bin" + + +PROMPT = "a man img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" +NEGATIVE_PROMPT = "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" +STYLE_STRENGTH = 10 +HEIGHT = 1024 +WIDTH = 1024 +CFG_SCALE = 5.0 +SAMPLE_METHOD = "euler" + + +def test_photomaker_2(): + + stable_diffusion = StableDiffusion( + photo_maker_path=PHOTO_MAKER_PATH, + model_path=MODEL_PATH, + vae_path=VAE_PATH, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + cfg_scale=CFG_SCALE, + height=HEIGHT, + width=WIDTH, + sample_method=SAMPLE_METHOD, + prompt=PROMPT, + negative_prompt=NEGATIVE_PROMPT, + pm_style_strength=STYLE_STRENGTH, + pm_id_images=[ + os.path.join(INPUT_ID_IMAGES_DIR, f) + for f in os.listdir(INPUT_ID_IMAGES_DIR) + if f.lower().endswith((".png", ".jpg", ".jpeg", ".bmp")) + ], + pm_id_embed_path=INPUT_ID_EMBED_PATH, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/photomaker_2.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + + +# cli_cmd = [ +# SD_CPP_CLI, +# "--photo-maker", +# PHOTO_MAKER_PATH, +# "--model", +# MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--prompt", +# PROMPT, +# "--negative-prompt", +# NEGATIVE_PROMPT, +# "--pm-style-strength", +# str(STYLE_STRENGTH), +# "--height", +# str(HEIGHT), +# "--width", +# str(WIDTH), +# "--cfg-scale", +# str(CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--pm-id-images-dir", +# INPUT_ID_IMAGES_PATH, +# "--pm-id-embed-path", +# INPUT_ID_EMBED_PATH, +# "--output", +# f"{OUTPUT_DIR}/photomaker_2_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_preprocess_canny.py b/tests/test_preprocess_canny.py index 3163c6f..d008547 100644 --- a/tests/test_preprocess_canny.py +++ b/tests/test_preprocess_canny.py @@ -1,23 +1,16 @@ -import os -import traceback +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion INPUT_IMAGE_PATH = "assets\\input.png" -stable_diffusion = StableDiffusion() -try: +def test_preprocess_canny(): + + stable_diffusion = StableDiffusion() + # Apply canny edge detection image = stable_diffusion.preprocess_canny(image=INPUT_IMAGE_PATH) - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) - # Save image image.save(f"{OUTPUT_DIR}/preprocess_canny.png") - -except Exception as e: - traceback.print_exc() - print("Test - preprocess_canny failed: ", e) diff --git a/tests/test_qwen_image.py b/tests/test_qwen_image.py new file mode 100644 index 0000000..f29440c --- /dev/null +++ b/tests/test_qwen_image.py @@ -0,0 +1,82 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\qwen\\Qwen_Image-Q4_K_M.gguf" +VAE_PATH = "F:\\stable-diffusion\\qwen\\qwen_image_vae.safetensors" +LLM_PATH = "F:\\stable-diffusion\\qwen\\Qwen2.5-VL-7B-Instruct.Q8_0.gguf" + + +PROMPT = '一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 “一、Qwen-Image的技术路线: 探索视觉生成基础模型的极限,开创理解与生成一体化的未来。二、Qwen-Image的模型特色:1、复杂文字渲染。支持中英渲染、自动布局; 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景:赋能专业内容创作、助力生成式AI发展。”' + +STEPS = 10 +CFG_SCALE = 2.5 +SAMPLE_METHOD = "euler" +FLOW_SHIFT = 3 + + +def test_qwen_image(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + llm_path=LLM_PATH, + vae_path=VAE_PATH, + offload_params_to_cpu=True, + flow_shift=FLOW_SHIFT, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + sample_method=SAMPLE_METHOD, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/qwen_image.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--llm", +# LLM_PATH, +# "--prompt", +# PROMPT, +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--flow-shift", +# str(FLOW_SHIFT), +# "--offload-to-cpu", +# "--output", +# f"{OUTPUT_DIR}/qwen_image_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_qwen_image_edit.py b/tests/test_qwen_image_edit.py new file mode 100644 index 0000000..07d0e5d --- /dev/null +++ b/tests/test_qwen_image_edit.py @@ -0,0 +1,86 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\qwen\\Qwen_Image_Edit-Q4_0.gguf" +VAE_PATH = "F:\\stable-diffusion\\qwen\\qwen_image_vae.safetensors" +LLM_PATH = "F:\\stable-diffusion\\qwen\\Qwen2.5-VL-7B-Instruct.Q8_0.gguf" + + +PROMPT = "put a party hat on the cat" +INPUT_IMAGE_PATHS = ["assets\\input.png"] + +STEPS = 10 +CFG_SCALE = 2.5 +SAMPLE_METHOD = "euler" +FLOW_SHIFT = 3 + + +def test_qwen_image_edit(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + llm_path=LLM_PATH, + vae_path=VAE_PATH, + offload_params_to_cpu=True, + flow_shift=FLOW_SHIFT, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + ref_images=INPUT_IMAGE_PATHS, + sample_method=SAMPLE_METHOD, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/qwen_image_edit.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--llm", +# LLM_PATH, +# "--ref-image", +# ",".join(INPUT_IMAGE_PATHS), +# "--prompt", +# PROMPT, +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--flow-shift", +# str(FLOW_SHIFT), +# "--offload-to-cpu", +# "--output", +# f"{OUTPUT_DIR}/qwen_image_edit_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_sd3.py b/tests/test_sd3.py index 334eff8..b71bccb 100644 --- a/tests/test_sd3.py +++ b/tests/test_sd3.py @@ -1,43 +1,89 @@ -import os -import traceback +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -MODEL_PATH = "C:\\stable-diffusion\\sd3.5\\sd3.5_large-q4_k_5_0.gguf" -CLIP_L_PATH = "C:\\stable-diffusion\\sd3.5\\clip_l.safetensors" -CLIP_G_PATH = "C:\\stable-diffusion\\sd3.5\\clip_g.safetensors" -T5XXL_PATH = "C:\\stable-diffusion\\sd3.5\\t5xxl_fp16.safetensors" +MODEL_PATH = "F:\\stable-diffusion\\sd3.5\\sd3.5_large-q4_k_5_0.gguf" +CLIP_L_PATH = "F:\\stable-diffusion\\sd3.5\\clip_l.safetensors" +CLIP_G_PATH = "F:\\stable-diffusion\\sd3.5\\clip_g.safetensors" +T5XXL_PATH = "F:\\stable-diffusion\\sd3.5\\t5xxl_fp8_e4m3fn.safetensors" -stable_diffusion = StableDiffusion( - model_path=MODEL_PATH, - clip_l_path=CLIP_L_PATH, - clip_g_path=CLIP_G_PATH, - t5xxl_path=T5XXL_PATH, -) +PROMPT = "a lovely cat holding a sign says 'Stable diffusion 3.5 Large'" +HEIGHT = 512 +WIDTH = 512 +CFG_SCALE = 4.5 +SAMPLE_METHOD = "euler" +SAMPLE_STEPS = 4 -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) +def test_sd3(): -try: - # Generate images - images = stable_diffusion.txt_to_img( - prompt="a lovely cat holding a sign says 'Stable diffusion 3.5 Large'", - height=832, - width=832, - cfg_scale=4.5, - sample_method="euler", + stable_diffusion = StableDiffusion( + model_path=MODEL_PATH, + clip_l_path=CLIP_L_PATH, + clip_g_path=CLIP_G_PATH, + t5xxl_path=T5XXL_PATH, + keep_clip_on_cpu=True, ) - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) - # Save images - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/sd3_{i}.png") + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + height=HEIGHT, + width=WIDTH, + cfg_scale=CFG_SCALE, + sample_method=SAMPLE_METHOD, + sample_steps=SAMPLE_STEPS, + progress_callback=progress_callback, + )[0] -except Exception as e: - traceback.print_exc() - print("Test - sd3 failed: ", e) + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/sd3.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + + +# cli_cmd = [ +# SD_CPP_CLI, +# "--model", +# MODEL_PATH, +# "--t5xxl", +# T5XXL_PATH, +# "--clip_l", +# CLIP_L_PATH, +# "--clip_g", +# CLIP_G_PATH, +# "--prompt", +# PROMPT, +# "--height", +# str(HEIGHT), +# "--width", +# str(WIDTH), +# "--cfg-scale", +# str(CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--steps", +# str(SAMPLE_STEPS), +# "--clip-on-cpu", +# "--output", +# f"{OUTPUT_DIR}/sd3_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_system_info.py b/tests/test_system_info.py index 7b39e4d..e28ae22 100644 --- a/tests/test_system_info.py +++ b/tests/test_system_info.py @@ -1,26 +1,24 @@ -import os -import traceback +from conftest import OUTPUT_DIR import stable_diffusion_cpp.stable_diffusion_cpp as sd_cpp -try: + +def test_system_info(): # Get system info system_info = sd_cpp.sd_get_system_info() - num_physical_cores = sd_cpp.get_num_physical_cores() + num_physical_cores = sd_cpp.sd_get_num_physical_cores() + sd_version = sd_cpp.sd_version().decode("utf-8") + sd_commit = sd_cpp.sd_commit().decode("utf-8") # Print system info print("System info: ", system_info) print("Number of physical cores: ", num_physical_cores) - - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) + print("SD Version: ", sd_version) + print("SD Commit: ", sd_commit) # Write system info to file txt with open(f"{OUTPUT_DIR}/system_info.txt", "w") as f: f.write(f"System info: {str(system_info)}\n") - f.write(f"Number of physical cores: {str(num_physical_cores)}") - -except Exception as e: - traceback.print_exc() - print("Test - system_info failed: ", e) + f.write(f"Number of physical cores: {str(num_physical_cores)}\n") + f.write(f"SD Version: {sd_version}\n") + f.write(f"SD Commit: {sd_commit}\n") diff --git a/tests/test_txt2img.py b/tests/test_txt2img.py index a3c183a..bf93d65 100644 --- a/tests/test_txt2img.py +++ b/tests/test_txt2img.py @@ -1,48 +1,79 @@ import os -import traceback + +from PIL import Image, PngImagePlugin +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -# MODEL_PATH = "C:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.q8_0.gguf" # GGUF model wont work for LORAs (GGML_ASSERT error) -# MODEL_PATH = "C:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" -MODEL_PATH = "C:\\stable-diffusion\\catCitronAnimeTreasure_v10.safetensors" +MODEL_PATH = "F:\\stable-diffusion\\catCitronAnimeTreasure_v10.safetensors" +LORA_DIR = "F:\\stable-diffusion\\loras" + -LORA_DIR = "C:\\stable-diffusion\\loras" +PROMPTS = [ + {"add": "_lora", "prompt": "a cute cat glass statue "}, # With LORA + {"add": "", "prompt": "a cute cat glass statue"}, # Without LORA +] +STEPS = 20 -stable_diffusion = StableDiffusion( - model_path=MODEL_PATH, - lora_model_dir=LORA_DIR, -) +PREVIEW_OUTPUT_DIR = f"{OUTPUT_DIR}/preview" +if not os.path.exists(PREVIEW_OUTPUT_DIR): + os.makedirs(PREVIEW_OUTPUT_DIR) -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) +def test_txt2img(): + stable_diffusion = StableDiffusion( + model_path=MODEL_PATH, + lora_model_dir=LORA_DIR, + ) -try: - prompts = [ - # {"add": "_lora", "prompt": "a lovely cat "}, # With LORA - # {"add": "", "prompt": "a lovely cat"}, # Without LORA - {"add": "_lora", "prompt": "a cute cat glass statue "}, # With LORA - {"add": "", "prompt": "a cute cat glass statue"}, # Without LORA - ] + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) + def preview_callback(step: int, images: list[Image.Image], is_noisy: bool): + images[0].save(f"{PREVIEW_OUTPUT_DIR}/{step}.png") - for prompt in prompts: - # Generate images - images = stable_diffusion.txt_to_img( + for prompt in PROMPTS: + # Generate image + image = stable_diffusion.generate_image( prompt=prompt["prompt"], - sample_steps=4, - progress_callback=callback, - ) + sample_steps=STEPS, + progress_callback=progress_callback, + preview_method="proj", + preview_interval=2, + preview_callback=preview_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/txt2img{prompt['add']}.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI - # Save images - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/txt2img{prompt['add']}_{i}.png") +# stable_diffusion = None # Clear model -except Exception as e: - traceback.print_exc() - print("Test - txt2img failed: ", e) +# for prompt in PROMPTS: +# cli_cmd = [ +# SD_CPP_CLI, +# "--model", +# MODEL_PATH, +# "--lora-model-dir", +# LORA_DIR, +# "--prompt", +# prompt["prompt"], +# "--steps", +# str(STEPS), +# "--output", +# f"{OUTPUT_DIR}/txt2img{prompt['add']}_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_upscale.py b/tests/test_upscale.py index c1eafac..b35b4ca 100644 --- a/tests/test_upscale.py +++ b/tests/test_upscale.py @@ -1,31 +1,29 @@ -import os -import traceback +from conftest import OUTPUT_DIR from stable_diffusion_cpp import StableDiffusion -UPSCALER_MODEL_PATH = "C:\\stable-diffusion\\RealESRGAN_x4plus.pth" +UPSCALER_MODEL_PATH = "F:\\stable-diffusion\\RealESRGAN_x4plus.pth" -input_images = ["assets\\input.png"] -stable_diffusion = StableDiffusion(upscaler_path=UPSCALER_MODEL_PATH) +INPUT_IMAGES = ["assets\\input.png"] +UPSCALE_FACTOR = 2 -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) +def test_upscale(): -try: - # Upscale images - images = stable_diffusion.upscale(images=input_images, upscale_factor=2, progress_callback=callback) + stable_diffusion = StableDiffusion(upscaler_path=UPSCALER_MODEL_PATH) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) + # Upscale images + images = stable_diffusion.upscale( + images=INPUT_IMAGES, + upscale_factor=UPSCALE_FACTOR, + progress_callback=progress_callback, + ) # Save images for i, image in enumerate(images): image.save(f"{OUTPUT_DIR}/upscale_{i}.png") - -except Exception as e: - traceback.print_exc() - print("Test - upscale failed: ", e) diff --git a/tests/test_vae_memory_leak.py b/tests/test_vae_memory_leak.py deleted file mode 100644 index 47aa002..0000000 --- a/tests/test_vae_memory_leak.py +++ /dev/null @@ -1,53 +0,0 @@ -import os -import time -import traceback - -from stable_diffusion_cpp import StableDiffusion - -MODEL_PATH = "C:\\stable-diffusion\\juggernautXL_V8+RDiffusion.safetensors" -# MODEL_PATH = "C:\\stable-diffusion\\turbovisionxlSuperFastXLBasedOnNew_tvxlV431Bakedvae.safetensors" -VAE_PATH = "C:\\stable-diffusion\\vaes\\sdxl_vae.safetensors" - - -stable_diffusion = None -stable_diffusion = StableDiffusion( - model_path=MODEL_PATH, - # upscaler_path="C:\\stable-diffusion\\RealESRGAN_x4plus.pth", - # vae_path=VAE_PATH, - # vae_tiling=True, - # keep_vae_on_cpu=True, - # keep_clip_on_cpu=True, - # keep_control_net_cpu=True, -) - - -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) - - -try: - # # Generate images - # images = stable_diffusion.txt_to_img( - # prompt="a lovely cat", - # sample_steps=2, - # seed=42, - # progress_callback=callback, - # ) - - # OUTPUT_DIR = "tests/outputs" - # if not os.path.exists(OUTPUT_DIR): - # os.makedirs(OUTPUT_DIR) - - # images[0].save(f"{OUTPUT_DIR}/vaetxt2img.png") - time.sleep(10) - - # Free memory - print("Free memory") - stable_diffusion = None - - time.sleep(10) - print("Finishing") - -except Exception as e: - traceback.print_exc() - print("Test - txt2img failed: ", e) diff --git a/tests/test_vid.py b/tests/test_vid.py new file mode 100644 index 0000000..eaf0dbe --- /dev/null +++ b/tests/test_vid.py @@ -0,0 +1,106 @@ +from conftest import OUTPUT_DIR, save_video_ffmpeg + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\wan\\wan2.1_t2v_1.3B_fp16.safetensors" +T5XXL_PATH = "F:\\stable-diffusion\\wan\\umt5-xxl-encoder-Q8_0.gguf" +VAE_PATH = "F:\\stable-diffusion\\wan\\wan_2.1_vae.safetensors" + + +PROMPT = "a cute dog jumping" +NEGATIVE_PROMPT = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部, 畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" +STEPS = 4 +CFG_SCALE = 6.0 +SAMPLE_METHOD = "euler" +WIDTH = 512 +HEIGHT = 512 +VIDEO_FRAMES = 10 +FLOW_SHIFT = 3.0 +FPS = 16 + + +def test_vid(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + t5xxl_path=T5XXL_PATH, + vae_path=VAE_PATH, + flow_shift=FLOW_SHIFT, + keep_clip_on_cpu=True, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate video frames + images = stable_diffusion.generate_video( + prompt=PROMPT, + negative_prompt=NEGATIVE_PROMPT, + cfg_scale=CFG_SCALE, + sample_steps=STEPS, + sample_method=SAMPLE_METHOD, + width=WIDTH, + height=HEIGHT, + video_frames=VIDEO_FRAMES, + progress_callback=progress_callback, + ) + + # Save video + save_video_ffmpeg(images, fps=FPS, out_path=f"{OUTPUT_DIR}/vid.mp4") + + # VID_DIR = f"{OUTPUT_DIR}/vid" + # if not os.path.exists(VID_DIR): + # os.makedirs(VID_DIR) + # for i, image in enumerate(images): + # image.save(f"{VID_DIR}/vid_{i}.png") + + +# =========================================== +# C++ CLI +# =========================================== + + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + + +# cli_cmd = [ +# SD_CPP_CLI, +# "--mode", +# "vid_gen", +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--t5xxl", +# T5XXL_PATH, +# "--prompt", +# PROMPT, +# "--negative-prompt", +# NEGATIVE_PROMPT, +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--width", +# str(WIDTH), +# "--height", +# str(HEIGHT), +# "--video-frames", +# str(VIDEO_FRAMES), +# "--fps", +# str(FPS), +# "--flow-shift", +# str(FLOW_SHIFT), +# "--clip-on-cpu", +# "--output", +# f"{OUTPUT_DIR}/vid_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_vid_vace.py b/tests/test_vid_vace.py new file mode 100644 index 0000000..744ce7e --- /dev/null +++ b/tests/test_vid_vace.py @@ -0,0 +1,113 @@ +import os + +from conftest import OUTPUT_DIR, save_video_ffmpeg + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\wan\\wan2.1-v3-vace-1.3b-q8_0.gguf" +T5XXL_PATH = "F:\\stable-diffusion\\wan\\umt5-xxl-encoder-Q8_0.gguf" +VAE_PATH = "F:\\stable-diffusion\\wan\\wan_2.1_vae.safetensors" + +VIDEO_FRAMES_DIR = "assets\\frames" + + +PROMPT = "dogs boxing" +NEGATIVE_PROMPT = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部, 畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" +STEPS = 4 +CFG_SCALE = 6.0 +SAMPLE_METHOD = "euler" +WIDTH = 512 +HEIGHT = 512 +VIDEO_FRAMES = 41 +FPS = 16 + + +def test_vid_vace(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + t5xxl_path=T5XXL_PATH, + vae_path=VAE_PATH, + keep_clip_on_cpu=True, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate video frames + images = stable_diffusion.generate_video( + prompt=PROMPT, + negative_prompt=NEGATIVE_PROMPT, + cfg_scale=CFG_SCALE, + sample_steps=STEPS, + sample_method=SAMPLE_METHOD, + width=WIDTH, + height=HEIGHT, + video_frames=VIDEO_FRAMES, + control_frames=[ + os.path.join(VIDEO_FRAMES_DIR, f) + for f in os.listdir(VIDEO_FRAMES_DIR) + if f.lower().endswith((".png", ".jpg", ".jpeg", ".bmp")) + ], + progress_callback=progress_callback, + ) + + # Save video + save_video_ffmpeg(images, fps=FPS, out_path=f"{OUTPUT_DIR}/vid_vace.mp4") + + # VID_DIR = f"{OUTPUT_DIR}/vid" + # if not os.path.exists(VID_DIR): + # os.makedirs(VID_DIR) + # for i, image in enumerate(images): + # image.save(f"{VID_DIR}/vid_{i}.png") + + +# =========================================== +# C++ CLI +# =========================================== + + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + + +# cli_cmd = [ +# SD_CPP_CLI, +# "--mode", +# "vid_gen", +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--vae", +# VAE_PATH, +# "--t5xxl", +# T5XXL_PATH, +# "--prompt", +# PROMPT, +# "--negative-prompt", +# NEGATIVE_PROMPT, +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--sampling-method", +# SAMPLE_METHOD, +# "--width", +# str(WIDTH), +# "--height", +# str(HEIGHT), +# "--video-frames", +# str(VIDEO_FRAMES), +# "--fps", +# str(FPS), +# "--control-video", +# VIDEO_FRAMES_DIR, +# "--clip-on-cpu", +# "--output", +# f"{OUTPUT_DIR}/vid_vace_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/test_vulkan.py b/tests/test_vulkan.py deleted file mode 100644 index 774c61b..0000000 --- a/tests/test_vulkan.py +++ /dev/null @@ -1,37 +0,0 @@ -import os -import traceback - -from stable_diffusion_cpp import StableDiffusion - -MODEL_PATH = "C:\\stable-diffusion\\v1-5-pruned-emaonly.safetensors" - -stable_diffusion = StableDiffusion(model_path=MODEL_PATH, wtype="f16", vae_tiling=True) - - -def callback(step: int, steps: int, time: float): - print("Completed step: {} of {}".format(step, steps)) - - -try: - OUTPUT_DIR = "tests/outputs" - if not os.path.exists(OUTPUT_DIR): - os.makedirs(OUTPUT_DIR) - - # Generate images - images = stable_diffusion.txt_to_img( - prompt="a fantasy character, detailed background, colorful", - sample_steps=20, - seed=42, - width=512, - height=512, - sample_method="euler", - progress_callback=callback, - ) - - # Save images - for i, image in enumerate(images): - image.save(f"{OUTPUT_DIR}/vulkan_{i}.png") - -except Exception as e: - traceback.print_exc() - print("Test - txt2img failed: ", e) diff --git a/tests/test_z_image.py b/tests/test_z_image.py new file mode 100644 index 0000000..999af49 --- /dev/null +++ b/tests/test_z_image.py @@ -0,0 +1,79 @@ +from PIL import PngImagePlugin +from conftest import OUTPUT_DIR + +from stable_diffusion_cpp import StableDiffusion + +DIFFUSION_MODEL_PATH = "F:\\stable-diffusion\\z-image\\z_image_turbo-Q3_K.gguf" +LLM_PATH = "F:\\stable-diffusion\\z-image\\Qwen3-4B-Instruct-2507-Q4_K_M.gguf" +VAE_PATH = "F:\\stable-diffusion\\z-image\\ae.safetensors" + +PROMPT = "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" +STEPS = 20 +CFG_SCALE = 1.0 +HEIGHT = 1024 +WIDTH = 512 + + +def test_z_image(): + + stable_diffusion = StableDiffusion( + diffusion_model_path=DIFFUSION_MODEL_PATH, + llm_path=LLM_PATH, + vae_path=VAE_PATH, + diffusion_flash_attn=True, + ) + + def progress_callback(step: int, steps: int, time: float): + print("Completed step: {} of {}".format(step, steps)) + + # Generate image + image = stable_diffusion.generate_image( + prompt=PROMPT, + height=HEIGHT, + width=WIDTH, + sample_steps=STEPS, + cfg_scale=CFG_SCALE, + progress_callback=progress_callback, + )[0] + + # Save image + pnginfo = PngImagePlugin.PngInfo() + pnginfo.add_text("Parameters", ", ".join([f"{k.replace('_', ' ').title()}: {v}" for k, v in image.info.items()])) + image.save(f"{OUTPUT_DIR}/z_image.png", pnginfo=pnginfo) + + +# =========================================== +# C++ CLI +# =========================================== + +# import subprocess + +# from conftest import SD_CPP_CLI + +# stable_diffusion = None # Clear model + +# cli_cmd = [ +# SD_CPP_CLI, +# "--diffusion-model", +# DIFFUSION_MODEL_PATH, +# "--llm", +# LLM_PATH, +# "--vae", +# VAE_PATH, +# "--prompt", +# PROMPT, +# "--height", +# str(HEIGHT), +# "--width", +# str(WIDTH), +# "--steps", +# str(STEPS), +# "--cfg-scale", +# str(CFG_SCALE), +# "--diffusion-fa", +# "--output", +# f"{OUTPUT_DIR}/z_image_cli.png", +# "-v", +# ] +# print(" ".join(cli_cmd)) +# subprocess.run(cli_cmd, check=True) diff --git a/tests/tests.md b/tests/tests.md index c54783b..0ebf071 100644 --- a/tests/tests.md +++ b/tests/tests.md @@ -1,11 +1,7 @@ ```bash -python tests\test_txt2img.py; python tests\test_controlnet.py; python tests\test_convert_model.py; python tests\test_flux.py; python tests\test_img2img.py; python tests\test_preprocess_canny.py; python tests\test_system_info.py; python tests\test_upscale.py; python tests\test_photomaker.py; python tests\test_inpainting.py +pytest -s ``` ```bash -python tests\test_vulkan.py +pytest tests\test_txt2img.py -s; pytest tests\test_controlnet.py -s; pytest tests\test_convert_model.py -s; pytest tests\test_flux.py -s; pytest tests\test_img2img.py -s; pytest tests\test_preprocess_canny.py -s; pytest tests\test_system_info.py -s; pytest tests\test_list_maps.py -s; pytest tests\test_upscale.py -s; pytest tests\test_photomaker.py -s; pytest tests\test_photomaker_2.py -s; pytest tests\test_inpainting.py -s; pytest tests\test_chroma.py -s; pytest tests\test_edit.py -s; pytest tests/test_sd3.py -s; pytest tests\test_vid.py -s; pytest tests\test_vid_vace.py -s; pytest tests\test_multi_gpu.py -s; pytest tests\test_flex2.py -s; pytest tests\test_memory_leak.py -s; pytest tests\test_qwen_image.py -s; pytest tests\test_qwen_image_edit.py -s; pytest tests\test_z_image.py -s; pytest tests\test_ovis.py -s; pytest tests\test_flux2.py -s; ``` - -```bash -python tests\test_img2vid.py; -``` \ No newline at end of file diff --git a/vendor/stable-diffusion.cpp b/vendor/stable-diffusion.cpp index 10c6501..43a70e8 160000 --- a/vendor/stable-diffusion.cpp +++ b/vendor/stable-diffusion.cpp @@ -1 +1 @@ -Subproject commit 10c6501bd05a697e014f1bee3a84e5664290c489 +Subproject commit 43a70e819b9254dee0d017305d6992f6bb27f850