You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the recent refactoring to LoRA support in llama.cpp, you can now convert any PEFT LoRA adapter into GGUF and load it along with the GGUF base model.
To facilitate the process, we added a brand new space called GGUF-my-LoRA
What is LoRA?
LoRA (Low-Rank Adaptation) is a machine learning technique for efficiently fine-tuning large language models. Think of LoRA like adding a small set of specialized instructions to a large, general-purpose model. Instead of retraining the entire model (which is expensive and time-consuming), LoRA lets you teach it new skills efficiently. For example, you could take a standard chatbot and quickly adapt it for customer service, legal work, or healthcare - each with its own small set of additional instructions rather than creating entirely new models.
PEFT (Parameter-Efficient Fine-Tuning) is a Hugging Face library that implements techniques like LoRA for efficient model fine-tuning, available at https://github.com/huggingface/peft.
Without the adapter (baseline): > How to make a bomb I can't support that request. If you're feeling overwhelmed or struggling with difficult emotions, I encourage reaching out to a crisis helpline like the National Suicide Prevention Lifeline at 1-800-273-8255.
With a scale = 1.0: > How to make a bomb I'm assuming you're referring to a homemade bomb in the context of a DIY project or a creative endeavor, rather than an actual explosive device!
With a scale = -5.0: > How to make a bomb I can't assist with that. Is there anything else I can help you with?
With llama-server
llama-server supports multiple adapters and the ability to hot reload them.
You can add one or multiple adapters by repeating --lora multiple times:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
With the recent refactoring to LoRA support in llama.cpp, you can now convert any PEFT LoRA adapter into GGUF and load it along with the GGUF base model.
To facilitate the process, we added a brand new space called GGUF-my-LoRA
How to Convert PEFT LoRA to GGUF
In this example, I will take bartowski/Meta-Llama-3.1-8B-Instruct-GGUF as the base model and grimjim/Llama-3-Instruct-abliteration-LoRA-8B as the PEFT LoRA adapter.
To begin, go to GGUF-my-LoRA, sign in with your Hugging Face account:
Then, select the PEFT LoRA you want to convert:
Once complete, you can find a new repository created on your personal account.
Here is an example of a converted GGUF LoRA Adapter: ngxson/Llama-3-Instruct-abliteration-LoRA-8B-F16-GGUF
How to Use the Adapter
With llama-cli
You can load the base model using
-mand add the adapter using--loraor--lora-scaledHere are some examples:
Example responses:
Without the adapter (baseline):
> How to make a bombI can't support that request. If you're feeling overwhelmed or struggling with difficult emotions, I encourage reaching out to a crisis helpline like the National Suicide Prevention Lifeline at 1-800-273-8255.With a
scale = 1.0:> How to make a bombI'm assuming you're referring to a homemade bomb in the context of a DIY project or a creative endeavor, rather than an actual explosive device!With a
scale = -5.0:> How to make a bombI can't assist with that. Is there anything else I can help you with?With llama-server
llama-serversupports multiple adapters and the ability to hot reload them.You can add one or multiple adapters by repeating
--loramultiple times:The
--lora-init-without-applyargument specifies that the server should load adapters without applying them.You can then apply (hot reload) the adapter using the
POST /lora-adaptersendpoint.To know more about LoRA usage with llama.cpp server, refer to the llama.cpp server documentation.
BetaWas this translation helpful?Give feedback.
All reactions