⚠️ EXPERIMENTAL FEATURES - This is the dev branch with experimental features.→ For releases and comprehensive documentation, visit the main branch
# Clone the dev branch git clone -b dev https://github.com/NVIDIA/Megatron-LM.git cd Megatron-LM # Install from source with dev dependencies (includes transformer_engine) pip install -e .[mlm,dev]- Streamlined Review: 1 code owner + 1 dev approver (can delegate review) + CI/CD
- 6-Month Timeline: Experimental features must graduate to stable or be deprecated
- Migration Support: Assistance provided for feature transitions
- Experimental Nature: Features may change or be removed as development progresses
- Testing: All features will pass convergence and performance validation before inclusion
- Support: Dev branch issues should include
[DEV]prefix
- 🚀 [2025/11] Optimizing DeepSeek-V3 Training Performance on NVIDIA GB200 NVL72.
- ⚡ [2025/11] A Guide to Reproduce DeepSeek-V3 Pre-training Performance on GB200.
- 📖 Documentation - Official documentation
- 🐛 Issues - Bug reports and feature requests
We ❤️ contributions! Ways to contribute:
- 🐛 Report bugs - Help us improve reliability
- 💡 Suggest features - Shape the future of Megatron Core
- 📝 Improve docs - Make Megatron Core more accessible
- 🔧 Submit PRs - Contribute code improvements
@article{megatron-lm, title={Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism}, author={Shoeybi, Mohammad and Patwary, Mostofa and Puri, Raul and LeGresley, Patrick and Casper, Jared and Catanzaro, Bryan}, journal={arXiv preprint arXiv:1909.08053}, year={2019} }