tuned fused configs for B300#30629

navmarri14 · 2025-12-14T00:21:15Z

Purpose

This PR adds a tuned fused MoE kernel configuration for the GLM-4.6 MoE architecture on NVIDIA B300 GPUs using FP8 quantization.

Specifically, it targets the configuration:

Experts (E): 160
Sharded size N=192 for TP=8, N=384 for TP=4, N=768 for TP=2
Device: NVIDIA B300
Dtype: fp8_w8a8
Previously, vLLM lacked a static configuration for these shapes on B300, causing it to fallback to heuristics or require JIT tuning during startup. This config improves startup time and ensures optimal kernel parameters are used for GLM-4 variants when running with tensor_parallel_size=2,4,8.

Test Plan
Generation:
The configuration was generated using the official benchmark script on an 8x B300 node:

TP=2

python benchmarks/kernels/benchmark_moe.py \ --model /path/to/ZhipuAI/GLM-4.6-FP8 \ --dtype fp8_w8a8 \ --tp-size 2 \ --tune \ --trust-remote-code \ --save-dir ./configs

TP=4

python benchmarks/kernels/benchmark_moe.py \ --model /path/to/ZhipuAI/GLM-4.6-FP8 \ --dtype fp8_w8a8 \ --tp-size 4 \ --tune \ --trust-remote-code \ --save-dir ./configs

TP=8

python benchmarks/kernels/benchmark_moe.py \ --model /path/to/ZhipuAI/GLM-4.6-FP8 \ --dtype fp8_w8a8 \ --tp-size 8 \ --tune \ --trust-remote-code \ --save-dir ./configs

github-actions · 2025-12-14T00:21:23Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

chatgpt-codex-connector · 2025-12-14T00:21:23Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request introduces tuned fused MoE kernel configurations for the GLM-4.6 MoE architecture on NVIDIA B300 GPUs with FP8 quantization. These configurations for different tensor parallelism sizes (2, 4, 8) will help improve startup times by avoiding the need for JIT tuning. The changes are well-described and the method for generating the configurations is clearly documented.

My main feedback is a minor issue across all three new configuration files: the specified Triton version 3.5.1 seems to be incorrect, as it does not correspond to a public Triton release. I've left comments with suggestions to correct or remove this for clarity and maintainability.

gemini-code-assist · 2025-12-14T00:22:33Z

...tor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_B300_SXM6_AC,dtype=fp8_w8a8.json

@@ -0,0 +1,147 @@
+{
+ "triton_version": "3.5.1",


The specified Triton version 3.5.1 appears to be incorrect, as there is no public release with this version number. The latest public release of Triton is 3.0.0. This could be misleading for future developers. Please either correct it to the version you used for tuning or remove this line. Since this field is popped from the config before use, removing it would have no functional impact.

gemini-code-assist · 2025-12-14T00:22:33Z

...tor/layers/fused_moe/configs/E=160,N=384,device_name=NVIDIA_B300_SXM6_AC,dtype=fp8_w8a8.json

@@ -0,0 +1,147 @@
+{
+ "triton_version": "3.5.1",


The specified Triton version 3.5.1 appears to be incorrect, as there is no public release with this version number. The latest public release of Triton is 3.0.0. This could be misleading for future developers. Please either correct it to the version you used for tuning or remove this line. Since this field is popped from the config before use, removing it would have no functional impact.

gemini-code-assist · 2025-12-14T00:22:33Z

...tor/layers/fused_moe/configs/E=160,N=768,device_name=NVIDIA_B300_SXM6_AC,dtype=fp8_w8a8.json

@@ -0,0 +1,147 @@
+{
+"triton_version": "3.5.1",


The specified Triton version 3.5.1 appears to be incorrect, as there is no public release with this version number. The latest public release of Triton is 3.0.0. This could be misleading for future developers. Please either correct it to the version you used for tuning or remove this line. Since this field is popped from the config before use, removing it would have no functional impact.

jeejeelee · 2025-12-14T13:40:57Z

cc @mgoin

ApostaC · 2025-12-16T02:33:22Z

cc @mgoin

tuned fused configs for B300
84ed6e2

navmarri14 requested review from mgoin and pavanimajety as code owners December 14, 2025 00:21

gemini-code-assistbot reviewed Dec 14, 2025
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tuned fused configs for B300#30629

tuned fused configs for B300 #30629

navmarri14 commented Dec 14, 2025•
edited by github-actions bot
Loading

Uh oh!

github-actionsbot commented Dec 14, 2025

Uh oh!

chatgpt-codex-connectorbot commented Dec 14, 2025

Uh oh!

gemini-code-assistbot left a comment

Uh oh!

gemini-code-assistbotDec 14, 2025

Uh oh!

gemini-code-assistbotDec 14, 2025

Uh oh!

gemini-code-assistbotDec 14, 2025

Uh oh!

jeejeelee commented Dec 14, 2025

Uh oh!

ApostaC commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

tuned fused configs for B300#30629

Are you sure you want to change the base?

tuned fused configs for B300 #30629

Conversation

navmarri14 commented Dec 14, 2025• edited by github-actions botLoading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

github-actionsbot commented Dec 14, 2025

Uh oh!

chatgpt-codex-connectorbot commented Dec 14, 2025

Uh oh!

gemini-code-assistbot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assistbotDec 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assistbotDec 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assistbotDec 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Dec 14, 2025

Uh oh!

ApostaC commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

navmarri14 commented Dec 14, 2025•
edited by github-actions bot
Loading