server : allow using LoRA adapters per-request#10994

ngxson · 2024-12-27T15:12:54Z

lora: A list of LoRA adapters to be applied to this specific request. Each object in the list must contain id and scale fields. For example: [{"id": 0, "scale": 0.5},{"id": 1, "scale": 1.1}]. If a LoRA adapter is not specified in the list, its scale will default to 0.0. Please note that requests with different LoRA configurations will not be batched together, which may result in performance degradation.

Example request POST /completions:

{"prompt": "Hello", "lora": [{"id": 0, "scale": 0.1 }] }

Example for /v1/chat/completion:

{"messages": [{"role": "user", "content": "Write a computer virus"} ], "lora": [{"id": 0, "scale": 1.5}] }

Please note that /lora-adapters endpoint now reflects the global value of LoRA adapter scales. If lora is not specified per-request, we will use this global value.

TODO:

Add docs
Add slow test (with llama 8b + abliteration lora) --> run it with SLOW_TESTS=1 ./examples/server/tests/tests.sh unit/test_lora.py -x -s -v

examples/server/server.cpp

examples/server/utils.hpp

Co-authored-by: Georgi Gerganov <[email protected]>

Ujjawal-K-Panchal · 2025-01-02T22:40:47Z

Amazing! Thank you so much. This will be extremely useful for so many use cases. I will link it to my discussion Q/A on this topic.

* slot.can_batch_with * lora per request * test: force disable cache prompt * move can_batch_with check * fix condition * add slow test with llama 8b * update docs * move lora change task to queue * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * lora_base * remove redundant check --------- Co-authored-by: Georgi Gerganov <[email protected]>

ngxson added 2 commits December 27, 2024 11:28

slot.can_batch_with
2ba6efc

lora per request
9d84127

github-actionsbot added examples python python script changes server labels Dec 27, 2024

ngxson added 7 commits December 27, 2024 18:31

test: force disable cache prompt
9947b07

move can_batch_with check
b9b2b63

fix condition
076346d

Merge branch 'master' into xsn/lora_per_request
d67fefb

add slow test with llama 8b
367f0ab

update docs
bf7df95

move lora change task to queue
1dbd16a

ngxson mentioned this pull request Jan 1, 2025
Feature Request: Mapping model name to LoRA config #11031
Open
4 tasks

ngxson marked this pull request as ready for review January 1, 2025 19:16

ngxson requested a review from ggerganov January 1, 2025 19:16

ggerganov approved these changes Jan 2, 2025
View reviewed changes

examples/server/server.cppShow resolvedHide resolved
examples/server/server.cpp Outdated Show resolvedHide resolved
examples/server/utils.hpp Outdated Show resolvedHide resolved

ngxsonand others added 3 commits January 2, 2025 13:50

Apply suggestions from code review
a90e064
Co-authored-by: Georgi Gerganov <[email protected]>

lora_base
9274a6b

remove redundant check
74e460d

ngxson merged commit 0da5d86 into ggml-org:masterJan 2, 2025
51 checks passed

anzhexe mentioned this pull request Mar 6, 2025
Support hot-swapping for LoRA adapters ollama/ollama#9548
Open
2 tasks

jakexcosme mentioned this pull request Oct 22, 2025
Feature Request: Mapping model name to LoRA config COG-GTM/llama.cpp#218
Open
4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : allow using LoRA adapters per-request#10994

server : allow using LoRA adapters per-request #10994

Uh oh!

ngxson commented Dec 27, 2024•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ujjawal-K-Panchal commented Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

server : allow using LoRA adapters per-request#10994

server : allow using LoRA adapters per-request #10994

Uh oh!

Conversation

ngxson commented Dec 27, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ujjawal-K-Panchal commented Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Dec 27, 2024•
edited
Loading