Tags · jeffbolznv/llama.cpp

b7403

convert : refactor rope scaling handling (ggml-org#18013) * refactor rope scaling handling * ws-- * missed a couple * use find_hparam

Dec 14, 2025
5c8a717
zip
tar.gz

b7387

llama_context: synchronize before reallocating output buffer (ggml-or……g#17974)

Dec 13, 2025
5266379
zip
tar.gz

b7359

ggml-alloc : fix reuse-parent logic for misaligned sizes (ggml-org#17884)

Dec 11, 2025
c6f6e4f
zip
tar.gz

b7349

cuda : add missing support check for xielu (ggml-org#17895)

Dec 10, 2025
4df6e85
zip
tar.gz

b7340

metal: SSM kernel improvements (ggml-org#17876) * feat: Add a batched version of ssm_conv This was done using Claude Code. It found a number of optimizations around how the threads were organized, resulting in a huge performance boost! Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <[email protected]> * feat: Optimized SSM_SCAN kernel for metal This used Claude Code and resulted in a modest performance improvement while maintaining correctness. Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <[email protected]> * test: Add test-backend-ops perf tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * test: Real representitive tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * refactor: Use function constant for ssm_conv batch size Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * test: backend op tests for ssm_scan from granite4 1b-h Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * style: remove commented out templates Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * feat: float4 version of ssm_conv_batched Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * fix: Add missing ggml_metal_cv_free Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> --------- Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

Dec 9, 2025
086a63e
zip
tar.gz

b7326

model-conversion : add token ids to prompt token output [no ci] (ggml……-org#17863) This commit adds the token ids to the printed prompt outputs. The motivation for this is that is can be useful to see the actual token ids alongside the token strings for debugging.

Dec 8, 2025
2fa51c1
zip
tar.gz

b7312

common : change --color to accept on/off/auto, default to auto (ggml-……org#17827)

Dec 7, 2025
2257758
zip
tar.gz

b7278

ci : transform release binary root dir in tar to llama-bXXXX (ggml-or……g#17773) * transform release binary root dir in tar to llama-bXXXX * bsdtar supports -s instead of --transform

Dec 5, 2025
03d9a77
zip
tar.gz

b7261

ggml-cpu: remove duplicate conditional check 'iid' (ggml-org#17650)

Dec 3, 2025
dea9ba2
zip
tar.gz

b7240

vulkan: Reduce temporary memory usage for TOP_K (ggml-org#17623) - Compute row size for the temp buffer based on the output of the first pass. - Update shader addressing math to use the output row size - Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k" For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer from about 3.2MB to 500KB.

Dec 2, 2025
61bde8e
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b7403

b7387

b7359

b7349

b7340

b7326

b7312

b7278

b7261

b7240

Tags: jeffbolznv/llama.cpp