Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpteendebakpt commented Jun 16, 2024

We make enum_iter thread-safe using a critical section. We use the same approach as in #119438 to allow for exits in the middle of the critical section. The method enum_iter_long is guarded by the critical section from enum_long.

Without making enum_iter thread safe problems can occur:

  • For enumerate(range(10)) pairs (n, m) with n unqual to m can be generated
  • When iterating over sys.maxsize, e.g. enumerate(range(sys.maxsize - 10, sys.maxsize + 10)) an overflow can occur.
  • As an optimization enum_iter keeps track of the returned tuple. In the free-threaded build multiple threads can operate on the same tuple.

To determine the overhead of making enumerate thread-safe here are some benchmarking results for a single-threaded application. Performance when actually using multiple-threads to readout the enumerate is considered less important.

Benchmark script:

import pyperf runner = pyperf.Runner() setup = """ def weighted_sum(k): value = 0 for n, m in enumerate(k): value += n * m return value range10 = range(10) range1000 = range(1000) x = list(range(10)) """ runner.timeit(name="enumerate(range10)", stmt="enumerate(range10)", setup=setup) runner.timeit(name="list(enumerate(range10))", stmt="list(enumerate(range10))", setup=setup) runner.timeit(name="list(enumerate(range1000))", stmt="list(enumerate(range1000))", setup=setup) runner.timeit(name="weighted_sum", stmt="weighted_sum(x)", setup=setup) 

Results of main versus free-threading:

enumerate(range10): Mean +- std dev: [enumerate_main] 73.2 ns +- 3.5 ns -> [enumerate_ft] 94.1 ns +- 3.5 ns: 1.29x slower list(enumerate(range10)): Mean +- std dev: [enumerate_main] 292 ns +- 10 ns -> [enumerate_ft] 351 ns +- 19 ns: 1.20x slower list(enumerate(range1000)): Mean +- std dev: [enumerate_main] 31.1 us +- 1.6 us -> [enumerate_ft] 32.5 us +- 1.4 us: 1.04x slower weighted_sum: Mean +- std dev: [enumerate_main] 433 ns +- 17 ns -> [enumerate_ft] 938 ns +- 49 ns: 2.16x slower Geometric mean: 1.37x slower 

Results of free-threading vs. free-threading with this PR:

list(enumerate(range10)): Mean +- std dev: [enumerate_ft] 351 ns +- 19 ns -> [enumerate_ft_pr] 410 ns +- 18 ns: 1.17x slower list(enumerate(range1000)): Mean +- std dev: [enumerate_ft] 32.5 us +- 1.4 us -> [enumerate_ft_pr] 40.9 us +- 1.7 us: 1.26x slower Benchmark hidden because not significant (2): enumerate(range10), weighted_sum Geometric mean: 1.10x slower 

The case enumerate(range10) is not affected by the PR (used to check the benchmarking is stable). The cases list(enumerate(range(x))) slow down a bit. More representative is perhaps the weighted_sum benchmark where the enumerate is used a for loop with a minimal amount of work. There the overhead of the locking not significant.

@eendebakpteendebakpt changed the title gh-120496: make enum_iter thread safegh-120496: Make enum_iter thread safeJun 16, 2024
}
Py_END_CRITICAL_SECTION();

if (reuse_result){
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we hold the only references to result this can be outside the critical section.

@eendebakpt
Copy link
ContributorAuthor

Closing in favor of #125734

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

@eendebakpt