gh-125985: Add free threading scaling micro benchmarks#125986

colesbury · 2024-10-25T16:31:06Z

These consist of a number of short snippets that help identify scaling bottlenecks in the free threaded interpreter.

The current bottlenecks are in calling functions in benchmarks that call functions (due to LOAD_ATTR not yet using deferred reference counting) and when accessing thread-local data.

Issue: Add free threading scaling microbenchmarks #125985

These consist of a number of short snippets that help identify scaling bottlenecks in the free threaded interpreter. The current bottlenecks are in calling functions in benchmarks that call functions (due to `LOAD_ATTR` not yet using deferred reference counting) and when accessing thread-local data.

colesbury · 2024-10-25T16:40:18Z

Some results below:

CPython 3.14t results

object_cfunction 1.1x faster cmodule_function 1.1x slower mult_constant 9.7x faster generator 9.5x faster pymethod 1.1x faster pyfunction 9.7x faster module_function 1.2x slower load_string_const 9.5x faster load_tuple_const 9.8x faster create_pyobject 9.5x faster create_closure 9.9x faster create_dict 9.8x faster thread_local_read 2.2x slower

CPython 3.13t results

object_cfunction 9.8x faster cmodule_function 9.6x faster mult_constant 8.8x faster generator 9.5x faster pymethod 9.7x faster pyfunction 9.7x faster module_function 10.0x faster load_string_const 1.8x slower load_tuple_const 9.5x faster create_pyobject 9.5x faster create_closure 9.8x faster create_dict 8.8x faster thread_local_read 2.0x slower

nogil fork (3.9) results

object_cfunction 10.4x faster cmodule_function 9.2x faster mult_constant 10.0x faster generator 9.0x faster pymethod 9.5x faster pyfunction 10.0x faster module_function 10.0x faster load_string_const 9.8x faster load_tuple_const 9.8x faster create_pyobject 9.6x faster create_closure 9.2x faster create_dict 9.6x faster thread_local_read 9.0x faster

As mentioned in the PR description, we have known scaling issues related to LOAD_ATTR not using deferred reference counting yet. We also have a scaling issue when reading thread-local data -- we should probably enable deferred reference counting on _thread._local objects.

The 3.13 release avoids the LOAD_ATTR scaling issues due to immortalization. However, we apparently have a bug related to string immortalization (load_string_const is slow) and the thread-local bottleneck is also present.

Note that small variations (e.g. 8.8x vs. 10.4x) are not meaningful.

Tools/ftscalingbench/ftscalingbench.py

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>

…125986) These consist of a number of short snippets that help identify scaling bottlenecks in the free threaded interpreter. The current bottlenecks are in calling functions in benchmarks that call functions (due to `LOAD_ATTR` not yet using deferred reference counting) and when accessing thread-local data.

colesbury added skip news topic-free-threading labels Oct 25, 2024

bedevere-appbot mentioned this pull request Oct 25, 2024
Add free threading scaling microbenchmarks #125985
Closed

Pull out reset_color
acd4627

colesbury marked this pull request as ready for review October 25, 2024 17:20

bedevere-appbot added the awaiting core review label Oct 25, 2024

colesbury requested review from Yhg1s and mpage October 25, 2024 17:20

mpage reviewed Oct 25, 2024
View reviewed changes

Tools/ftscalingbench/ftscalingbench.py Outdated Show resolvedHide resolved
Tools/ftscalingbench/ftscalingbench.pyShow resolvedHide resolved
Tools/ftscalingbench/ftscalingbench.pyShow resolvedHide resolved

tomasr8 reviewed Oct 25, 2024
View reviewed changes

Tools/ftscalingbench/ftscalingbench.pyShow resolvedHide resolved

colesbury added 2 commits October 28, 2024 15:55

Sort imports
0849582

Set LC_NUMERIC=C and handle empty MAXMHZ
6e731c6

colesbury requested review from mpage and tomasr8 October 28, 2024 16:16

mpage approved these changes Oct 28, 2024
View reviewed changes

bedevere-appbot added awaiting merge and removed awaiting core review labels Oct 28, 2024

tomasr8 approved these changes Oct 28, 2024
View reviewed changes

Tools/ftscalingbench/ftscalingbench.py Outdated Show resolvedHide resolved

colesburyand others added 2 commits October 28, 2024 12:36

Update Tools/ftscalingbench/ftscalingbench.py
3014c32
Co-authored-by: Tomas R. <tomas.roun8@gmail.com>

Merge branch 'main' intopythongh-125985-ftscalingbench
e225113

colesbury merged commit 00ea179 into python:mainOct 28, 2024

colesbury deleted the gh-125985-ftscalingbench branch October 28, 2024 21:47

bedevere-appbot removed the awaiting merge label Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-125985: Add free threading scaling micro benchmarks#125986

gh-125985: Add free threading scaling micro benchmarks #125986

Uh oh!

colesbury commented Oct 25, 2024•
edited by bedevere-app bot
Loading

Uh oh!

colesbury commented Oct 25, 2024•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

gh-125985: Add free threading scaling micro benchmarks#125986

gh-125985: Add free threading scaling micro benchmarks #125986

Uh oh!

Conversation

colesbury commented Oct 25, 2024• edited by bedevere-app botLoading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colesbury commented Oct 25, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

colesbury commented Oct 25, 2024•
edited by bedevere-app bot
Loading

colesbury commented Oct 25, 2024•
edited
Loading