Skip to content

Conversation

@donbarbos
Copy link
Contributor

@donbarbosdonbarbos commented Feb 9, 2025

Another attempt to improve import time of stdlib modules.
Importing dataclasses takes a long time and affects many other modules so it needs to do is make dataclasses better.

I use lazy importing for 4 largest modules (re, copy, inspect, annotationlib), they are also rarely called (1 and 2 times)

CPython configure flags:

./configure --enable-optimizations --with-lto --enable-loadable-sqlite-extensions

Benchmarks:

Running: pipx install tuna && ./python -X importtime -c 'import dataclasses' 2> import.log && tuna import.log

Total import time: 0.022s -> 0.008s = x2.75 as fast

main branchPR branch
Screenshot from 2025-02-10 03-40-23Screenshot from 2025-02-10 03-43-08

dataclasses import time: 0.015s -> 0.001s = x15 as fast

main branchPR branch
Screenshot from 2025-02-10 03-41-11Screenshot from 2025-02-10 03-43-23

hyperfine: 24.ms -> 10.2ms = x2.4 as fast

Main branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import dataclasses'" Benchmark 1: ./python -c 'import dataclasses' Time (mean ± σ): 24.5 ms ± 1.2 ms [User: 21.2 ms, System: 3.3 ms] Range (min … max): 22.9 ms … 38.1 ms 3000 runs

PR branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import dataclasses'" Benchmark 1: ./python -c 'import dataclasses' Time (mean ± σ): 10.2 ms ± 0.4 ms [User: 8.3 ms, System: 1.8 ms] Range (min … max): 9.8 ms … 19.9 ms 3000 runs

@donbarbosdonbarbos changed the title gh-11876: Improve import time of dataclassesgh-118761: Improve import time of dataclassesFeb 10, 2025
@donbarbos
Copy link
ContributorAuthor

On Trade-offs

Here is the new call trace

  1. import re;re.compile in _is_type()
  2. _is_type() in _process_class() and _get_field()
  3. _get_field() in _process_class()
  4. _process_class() in dataclass.wrap()

  1. import annotationlib in _process_class()
  2. _process_class() in dataclass.wrap()

  1. import inspect in _process_class() and _add_slots()
  2. _add_slots() in _process_class()
  3. _process_class() in dataclass.wrap()

  1. import copy in _asdict_inner() and _astuple_inner()
  2. _asdict_inner() in asdict()
  3. _astuple_inner() in astuple()

And I wrote benchmarks to test calling affected public functions (dataclass decorator and asdict, astuple functions):

bench_dataclass.py starts like this:

importtimeimportdataclassesimportoscode="""@dataclasses.dataclassclass Address: city: str zip_code: str@dataclasses.dataclassclass Person: name: str age: int addresses: list[Address] = dataclasses.field(default_factory=list) metadata: dict[str, str] = dataclasses.field(default_factory=dict)"""result_times= [] for_inrange(100): os.system("sync && echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null") start_time=time.time() exec(code,{"dataclasses": dataclasses}) end_time=time.time() result_times.append(end_time-start_time)

bench_asdict.py starts like this:

importtimeimportdataclassesimportos@dataclasses.dataclassclassAddress: city: strzip_code: str@dataclasses.dataclassclassPerson: name: strage: intaddresses: list[Address] =dataclasses.field(default_factory=list) metadata: dict[str, str] =dataclasses.field(default_factory=dict) person=Person( name="John Doe", age=30, addresses=[Address("New York", "10001"), Address("Los Angeles", "90001")], metadata={"key1": "value1", "key2": "value2"} ) code="""dataclasses.asdict(person)"""result_times= [] for_inrange(100): os.system("sync && echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null") start_time=time.time() exec(code,{"dataclasses": dataclasses, "person": person}) end_time=time.time() result_times.append(end_time-start_time)

And I getting stats in both scripts in the end of file like this:

importstatisticsfirst=result_times[0] mean=statistics.mean(result_times) median=statistics.median(result_times) stdev=statistics.stdev(result_times) variance=statistics.variance(result_times) print(f"First time: {first*1_000_000:.2f}μs") print(f"Mean: {mean*1_000_000:.2f}μs") print(f"Median: {median*1_000_000:.2f}μs") print(f"Standard deviation: {stdev*1_000_000:.2f}μs") print(f"Variance: {variance*1_000_000:.2f}μs")

Results on main branch:

$ ./python -B bench_dataclass.py First time: 1586.20μs Mean: 1267.27μs Median: 1243.83μs Standard deviation: 106.15μs Variance: 0.01μs $ ./python -B bench_asdict.py First time: 99.66μs Mean: 112.33μs Median: 109.91μs Standard deviation: 12.26μs Variance: 0.00μs

Results on PR branch:

$ ./python -B bench_dataclass.py First time: 20320.18μs # is 20ms or 0.02s and this is a permanent result for first call Mean: 1454.47μs Median: 1243.11μs Standard deviation: 1911.43μs Variance: 3.65μs $ ./python -B bench_asdict.py First time: 100.37μs Mean: 112.21μs Median: 108.96μs Standard deviation: 13.01μs Variance: 0.00μs

In the end we can say that about 20ms was added for the first mention of @dataclass in the code.

Copy link
Contributor

@eli-schwartzeli-schwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use lazy importing for 4 largest modules (re, copy, inspect, annotationlib), they are also rarely called (1 and 2 times)

I don't fully understand what you mean by "rarely", I think. It looks like a lot of this PR is essentially delaying imports that will be unconditionally used in all code that actually utilizes a dataclass. The timings are based on import time for the module itself, but given that the primary use of this module is as a decorator, it's always going to actually use the module at import time, unlike other modules where you might import it at the top of the file and then only use it inside of an if block.

So I don't think you can consider the import time of dataclasses without considering how it's actually used. Any imports that are unconditionally used when decorating a class aren't beneficial to delay the import of (but will incur the use-time cost of re-running import itself, which isn't major but does exist).

@donbarbos
Copy link
ContributorAuthor

@eli-schwartz yes, but most of the PRs for this issue are also adding lazy imports.

@eli-schwartz
Copy link
Contributor

I'm not sure what point you're trying to make.

Most of the PRs for this issue are adding lazy imports. Lazy imports are a useful tool for making python programs faster, in the principle of "only pay for what you use" -- and stdlib modules often don't know what an application will in fact use.

The issue here is that as far as I can tell you're adding lazy imports for things that the consumer will always use, which means that there won't be a benefit to making them lazy...

@gpshead
Copy link
Member

I'm closing this one based on Eli's analysis.

@gpsheadgpshead closed this Feb 17, 2025
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

@donbarbos@eli-schwartz@gpshead