Skip to content

Tags: jeffbolznv/llama.cpp

Tags

b7403

Toggle b7403's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
convert : refactor rope scaling handling (ggml-org#18013) * refactor rope scaling handling * ws-- * missed a couple * use find_hparam

b7387

Toggle b7387's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama_context: synchronize before reallocating output buffer (ggml-or……g#17974)

b7359

Toggle b7359's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-alloc : fix reuse-parent logic for misaligned sizes (ggml-org#17884)

b7349

Toggle b7349's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cuda : add missing support check for xielu (ggml-org#17895) 

b7340

Toggle b7340's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal: SSM kernel improvements (ggml-org#17876) * feat: Add a batched version of ssm_conv This was done using Claude Code. It found a number of optimizations around how the threads were organized, resulting in a huge performance boost! Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <[email protected]> * feat: Optimized SSM_SCAN kernel for metal This used Claude Code and resulted in a modest performance improvement while maintaining correctness. Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <[email protected]> * test: Add test-backend-ops perf tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * test: Real representitive tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * refactor: Use function constant for ssm_conv batch size Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * test: backend op tests for ssm_scan from granite4 1b-h Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * style: remove commented out templates Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * feat: float4 version of ssm_conv_batched Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <[email protected]> * fix: Add missing ggml_metal_cv_free Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> --------- Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

b7326

Toggle b7326's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
model-conversion : add token ids to prompt token output [no ci] (ggml……-org#17863) This commit adds the token ids to the printed prompt outputs. The motivation for this is that is can be useful to see the actual token ids alongside the token strings for debugging.

b7312

Toggle b7312's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common : change --color to accept on/off/auto, default to auto (ggml-……org#17827)

b7278

Toggle b7278's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci : transform release binary root dir in tar to llama-bXXXX (ggml-or……g#17773) * transform release binary root dir in tar to llama-bXXXX * bsdtar supports -s instead of --transform

b7261

Toggle b7261's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-cpu: remove duplicate conditional check 'iid' (ggml-org#17650) 

b7240

Toggle b7240's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Reduce temporary memory usage for TOP_K (ggml-org#17623) - Compute row size for the temp buffer based on the output of the first pass. - Update shader addressing math to use the output row size - Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k" For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer from about 3.2MB to 500KB.