GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Train gpt-oss, DeepSeek, Gemma, Qwen & Llama 2x faster with 70% less VRAM!

✨ Train for Free

Notebooks are beginner friendly. Read our guide. Add dataset, run, then export your trained model to GGUF, llama.cpp, Ollama, vLLM, SGLang or Hugging Face.

Model	Free Notebooks	Performance	Memory use
gpt-oss (20B)	▶️ Start for free	1.5x faster	70% less
Mistral Ministral 3 (3B)	▶️ Start for free	1.5x faster	60% less
gpt-oss (20B): GRPO	▶️ Start for free	2x faster	80% less
Qwen3: Advanced GRPO	▶️ Start for free	2x faster	50% less
Qwen3-VL (8B): GSPO	▶️ Start for free	1.5x faster	80% less
Gemma 3 (270M)	▶️ Start for free	1.7x faster	60% less
Gemma 3n (4B)	▶️ Start for free	1.5x faster	50% less
DeepSeek-OCR (3B)	▶️ Start for free	1.5x faster	30% less
Llama 3.1 (8B) Alpaca	▶️ Start for free	2x faster	70% less
Llama 3.2 Conversational	▶️ Start for free	2x faster	70% less
Orpheus-TTS (3B)	▶️ Start for free	1.5x faster	50% less

See all our notebooks for: Kaggle, GRPO, TTS & Vision
See all our models and all our notebooks
See detailed documentation for Unsloth here

⚡ Quickstart

Linux or WSL

pip install unsloth

Windows

For Windows, pip install unsloth works only if you have Pytorch installed. Read our Windows Guide.

Docker

Use our official Unsloth Docker imageunsloth/unsloth container. Read our Docker Guide.

Blackwell & DGX Spark

For RTX 50x, B200, 6000 GPUs: pip install unsloth. Read our Blackwell Guide and DGX Spark Guide for more details.

🦥 Unsloth News

New RoPE & MLP Triton Kernels & Auto Packing: 3x faster training & 30% less VRAM. Blog
Ministral 3 by Mistral: Run Ministral 3 or fine-tune with our vision or RL sodoku notebook. Guide • Notebooks
500K Context: Training a 20B model with >500K context is now possible on an 80GB GPU. Blog
FP8 Reinforcement Learning: You can now do FP8 GRPO on consumer GPUs. Blog • Notebook
DeepSeek-OCR: Fine-tune to improve language understanding by 89%. Guide • Notebook
Docker: Use Unsloth with no setup & environment issues with our new image. Guide • Docker image
gpt-oss RL: Introducing the fastest possible inference for gpt-oss RL! Read blog
Vision RL: You can now train VLMs with GRPO or GSPO in Unsloth! Read guide
gpt-oss by OpenAI: Read our Unsloth Flex Attention blog and gpt-oss Guide. 20B works on 14GB VRAM. 120B on 65GB.

Click for more news

Quantization-Aware Training: We collabed with Pytorch, recovering ~70% accuracy. Read blog
Memory-efficient RL: We're introducing even better RL. Our new kernels & algos allows faster RL with 50% less VRAM & 10× more context. Read blog
Gemma 3n by Google: Read Blog. We uploaded GGUFs, 4-bit models.
Text-to-Speech (TTS) is now supported, including sesame/csm-1b and STT openai/whisper-large-v3.
Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
Introducing Dynamic 2.0 quants that set new benchmarks on 5-shot MMLU & Aider Polyglot.
EVERYTHING is now supported - all models (TTS, BERT, Mamba), FFT, etc. MultiGPU coming soon. Enable FFT with full_finetuning = True, 8-bit with load_in_8bit = True.
📣 DeepSeek-R1 - run or fine-tune them with our guide. All model uploads: here.
📣 Introducing Long-context Reasoning (GRPO) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
📣 Introducing Unsloth Dynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection on Hugging Face here.
📣 Llama 4 by Meta, including Scout & Maverick are now supported.
📣 Phi-4 by Microsoft: We also fixed bugs in Phi-4 and uploaded GGUFs, 4-bit.
📣 Vision models now supported! Llama 3.2 Vision (11B), Qwen 2.5 VL (7B) and Pixtral (12B) 2409
📣 Llama 3.3 (70B), Meta's latest model is supported.
📣 We worked with Apple to add Cut Cross Entropy. Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
📣 We found and helped fix a gradient accumulation bug! Please update Unsloth and transformers.
📣 We cut memory usage by a further 30% and now support 4x longer context windows!

🔗 Links and Resources

Type	Links
r/unsloth Reddit	Join Reddit community
📚 Documentation & Wiki	Read Our Docs
Twitter (aka X)	Follow us on X
💾 Installation	Pip & Docker Install
🔮 Our Models	Unsloth Catalog
✍️ Blog	Read our Blogs

⭐ Key Features

Supports full-finetuning, pretraining, 4b-bit, 16-bit and FP8 training
Supports all models including TTS, multimodal, BERT and more! Any model that works in transformers, works in Unsloth.
The most efficient library for Reinforcement Learning (RL), using 80% less VRAM. Supports GRPO, GSPO, DrGRPO, DAPO etc.
0% loss in accuracy - no approximation methods - all exact.
Supports NVIDIA (since 2018), AMD and Intel GPUs. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)
Works on Linux, WSL and Windows
All kernels written in OpenAI's Triton language. Manual backprop engine.
If you trained a model with 🦥Unsloth, you can use this cool sticker!

💾 Install Unsloth

You can also see our docs for more detailed installation and updating instructions here.

Unsloth supports Python 3.13 or lower.

Pip Installation

Install with pip (recommended) for Linux devices:

pip install unsloth

To update Unsloth:

pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

See here for advanced pip install instructions.

Windows Installation

Install NVIDIA Video Driver: You should install the latest driver for your GPU. Download drivers here: NVIDIA GPU Driver.
Install Visual Studio C++: You will need Visual Studio, with C++ installed. By default, C++ is not installed with Visual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, see here.
Install CUDA Toolkit: Follow the instructions to install CUDA Toolkit.
Install PyTorch: You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully. Install PyTorch.
Install Unsloth:

pipinstallunsloth

Notes

To run Unsloth directly on Windows:

Install Triton from this Windows fork and follow the instructions here (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
In the SFTConfig, set dataset_num_proc=1 to avoid a crashing issue:

SFTConfig( dataset_num_proc=1, ... )

Advanced/Troubleshooting

For advanced installation instructions or if you see weird errors during installations:

First try using an isolated environment via then pip install unsloth

python -m venv unsloth source unsloth/bin/activate pip install unsloth

Install torch and triton. Go to https://pytorch.org to install it. For example pip install torch torchvision torchaudio triton
Confirm if CUDA is installed correctly. Try nvcc. If that fails, you need to install cudatoolkit or CUDA drivers.
Install xformers manually via:

pip install ninja pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs and ignore `xformers`

For GRPO runs, you can try installing vllm and seeing if pip install vllm succeeds.
Double check that your versions of Python, CUDA, CUDNN, torch, triton, and xformers are compatible with one another. The PyTorch Compatibility Matrix may be useful.
Finally, install bitsandbytes and check it with python -m bitsandbytes

Conda Installation (Optional)

⚠️Only use Conda if you have it. If not, use Pip. Select either pytorch-cuda=11.8,12.1 for CUDA 11.8 or CUDA 12.1. We support python=3.10,3.11,3.12.

conda create --name unsloth_env \ python=3.11 \ pytorch-cuda=12.1 \ pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \ -y conda activate unsloth_env pip install unsloth

If you're looking to install Conda in a Linux environment, read here, or run the below 🔽

mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh ~/miniconda3/bin/conda init bash ~/miniconda3/bin/conda init zsh

Advanced Pip Installation

⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different for torch 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9 and CUDA versions.

For other torch versions, we support torch211, torch212, torch220, torch230, torch240, torch250, torch260, torch270, torch280, torch290 and for CUDA versions, we support cu118 and cu121 and cu124. For Ampere devices (A100, H100, RTX3090) and above, use cu118-ampere or cu121-ampere or cu124-ampere.

For example, if you have torch 2.4 and CUDA 12.1, use:

pip install --upgrade pip pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

Another example, if you have torch 2.9 and CUDA 13.0, use:

pip install --upgrade pip pip install "unsloth[cu130-torch290] @ git+https://github.com/unslothai/unsloth.git"

And other examples:

pip install "unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git" pip install "unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git" pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git" pip install "unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git" pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git" pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git" pip install "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git" pip install "unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"

Or, run the below in a terminal to get the optimal pip installation command:

wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -

Or, run the below manually in a Python REPL:

try: importtorchexcept: raiseImportError('Install torch via `pip install torch`') frompackaging.versionimportVersionasVimportrev=V(re.match(r"[0-9\.]{3,}", torch.__version__).group(0)) cuda=str(torch.version.cuda) is_ampere=torch.cuda.get_device_capability()[0] >=8USE_ABI=torch._C._GLIBCXX_USE_CXX11_ABIifcudanotin ("11.8", "12.1", "12.4", "12.6", "12.8", "13.0"): raiseRuntimeError(f"CUDA = {cuda} not supported!") ifv<=V('2.1.0'): raiseRuntimeError(f"Torch = {v} too old!") elifv<=V('2.1.1'): x='cu{}{}-torch211'elifv<=V('2.1.2'): x='cu{}{}-torch212'elifv<V('2.3.0'): x='cu{}{}-torch220'elifv<V('2.4.0'): x='cu{}{}-torch230'elifv<V('2.5.0'): x='cu{}{}-torch240'elifv<V('2.5.1'): x='cu{}{}-torch250'elifv<=V('2.5.1'): x='cu{}{}-torch251'elifv<V('2.7.0'): x='cu{}{}-torch260'elifv<V('2.7.9'): x='cu{}{}-torch270'elifv<V('2.8.0'): x='cu{}{}-torch271'elifv<V('2.8.9'): x='cu{}{}-torch280'elifv<V('2.9.1'): x='cu{}{}-torch290'elifv<V('2.9.2'): x='cu{}{}-torch291'else: raiseRuntimeError(f"Torch = {v} too new!") ifv>V('2.6.9') andcudanotin ("11.8", "12.6", "12.8", "13.0"): raiseRuntimeError(f"CUDA = {cuda} not supported!") x=x.format(cuda.replace(".", ""), "-ampere"ifFalseelse"") # is_ampere is broken due to flash-attnprint(f'pip install --upgrade pip && pip install --no-deps git+https://github.com/unslothai/unsloth-zoo.git && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git" --no-build-isolation')

Docker Installation

You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required. Read our guide.

This container requires installing NVIDIA's Container Toolkit.

docker run -d -e JUPYTER_PASSWORD="mypassword" \ -p 8888:8888 -p 2222:22 \ -v $(pwd)/work:/workspace/work \ --gpus all \ unsloth/unsloth

Access Jupyter Lab at http://localhost:8888 and start fine-tuning!

📜 Documentation

Go to our official Documentation for running models, saving to GGUF, checkpointing, evaluation and more!
Read our Guides for: Fine-tuning, Reinforcement Learning, Text-to-Speech (TTS), Vision and any model.
We support Huggingface's transformers, TRL, Trainer, Seq2SeqTrainer and Pytorch code.

Unsloth example code to fine-tune gpt-oss-20b:

fromunslothimportFastLanguageModel, FastModelimporttorchfromtrlimportSFTTrainer, SFTConfigfromdatasetsimportload_datasetmax_seq_length=2048# Supports RoPE Scaling internally, so choose any!# Get LAION dataseturl="https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"dataset=load_dataset("json", data_files={"train" : url}, split="train") # 4bit pre quantized models we support for 4x faster downloading + no OOMs.fourbit_models= [ "unsloth/gpt-oss-20b-unsloth-bnb-4bit", #or choose any model ] # More models at https://huggingface.co/unslothmodel, tokenizer=FastModel.from_pretrained( model_name="unsloth/gpt-oss-20b", max_seq_length=2048, # Choose any for long context!load_in_4bit=True, # 4-bit quantization. False = 16-bit LoRA.load_in_8bit=False, # 8-bit quantizationload_in_16bit=False, # [NEW!] 16-bit LoRAfull_finetuning=False, # Use for full fine-tuning.# token = "hf_...", # use one if using gated models ) # Do model patching and add fast LoRA weightsmodel=FastLanguageModel.get_peft_model( model, r=16, target_modules= ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha=16, lora_dropout=0, # Supports any, but = 0 is optimizedbias="none", # Supports any, but = "none" is optimized# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing="unsloth", # True or "unsloth" for very long contextrandom_state=3407, max_seq_length=max_seq_length, use_rslora=False, # We support rank stabilized LoRAloftq_config=None, # And LoftQ ) trainer=SFTTrainer( model=model, train_dataset=dataset, tokenizer=tokenizer, args=SFTConfig( max_seq_length=max_seq_length, per_device_train_batch_size=2, gradient_accumulation_steps=4, warmup_steps=10, max_steps=60, logging_steps=1, output_dir="outputs", optim="adamw_8bit", seed=3407, ), ) trainer.train() # Go to https://docs.unsloth.ai for advanced tips like# (1) Saving to GGUF / merging to 16bit for vLLM or SGLang# (2) Continued training from a saved LoRA adapter# (3) Adding an evaluation loop / OOMs# (4) Customized chat templates

💡 Reinforcement Learning

RL including GRPO, GSPO, FP8 traning, DrGRPO, DAPO, PPO, Reward Modelling, Online DPO all work with Unsloth. Read our Reinforcement Learning Guide or our advanced RL docs for batching, generation & training parameters.

List of RL notebooks:

gpt-oss GSPO notebook: Link
Qwen2.5-VL GSPO notebook: Link
Advanced Qwen3 GRPO notebook: Link
FP8 Qwen3-8B GRPO notebook (L4): Link
ORPO notebook: Link
DPO Zephyr notebook: Link
KTO notebook: Link
SimPO notebook: Link

🥇 Performance Benchmarking

For our most detailed benchmarks, read our Llama 3.3 Blog.
Benchmarking of Unsloth was also conducted by 🤗Hugging Face.

We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):

Model	VRAM	🦥 Unsloth speed	🦥 VRAM reduction	🦥 Longer context	😊 Hugging Face + FA2
Llama 3.3 (70B)	80GB	2x	>75%	13x longer	1x
Llama 3.1 (8B)	80GB	2x	>70%	12x longer	1x

Context length benchmarks

Llama 3.1 (8B) max. context length

We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM	🦥Unsloth context length	Hugging Face + FA2
8 GB	2,972	OOM
12 GB	21,848	932
16 GB	40,724	2,551
24 GB	78,475	5,789
40 GB	153,977	12,264
48 GB	191,728	15,502
80 GB	342,733	28,454

Llama 3.3 (70B) max. context length

We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM	🦥Unsloth context length	Hugging Face + FA2
48 GB	12,106	OOM
80 GB	89,389	6,916

Citation

You can cite the Unsloth repo as follows:

@software{unsloth, author = {Daniel Han, Michael Han and Unsloth team}, title = {Unsloth}, url = {http://github.com/unslothai/unsloth}, year = {2023} }

Thank You to

The llama.cpp library that lets users save models with Unsloth
The Hugging Face team and their libraries: transformers and TRL
The Pytorch and Torch AO team for their contributions
Erik for his help adding Apple's ML Cross Entropy in Unsloth
Etherl for adding support for TTS, diffusion and BERT models
And of course for every single person who has contributed or has used Unsloth!

Name		Name	Last commit message	Last commit date
Latest commit History 3,073 Commits
.github		.github
images		images
scripts		scripts
tests		tests
unsloth		unsloth
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-ci.yaml		.pre-commit-ci.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
unsloth-cli.py		unsloth-cli.py

Uh oh!

License

unslothai/unsloth

Folders and files

Latest commit

History

Repository files navigation

Train gpt-oss, DeepSeek, Gemma, Qwen & Llama 2x faster with 70% less VRAM!

✨ Train for Free

⚡ Quickstart

Linux or WSL

Windows

Docker

Blackwell & DGX Spark

🦥 Unsloth News

🔗 Links and Resources

⭐ Key Features

💾 Install Unsloth

Pip Installation

Windows Installation

Notes

Advanced/Troubleshooting

Conda Installation (Optional)

Advanced Pip Installation

Docker Installation

📜 Documentation

💡 Reinforcement Learning

🥇 Performance Benchmarking

Context length benchmarks

Llama 3.1 (8B) max. context length

Llama 3.3 (70B) max. context length

Citation

Thank You to

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 21

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors 130

Languages

Packages