Skip to content

Conversation

@AlexWaygood
Copy link
Member

@AlexWaygoodAlexWaygood commented Apr 2, 2023

Micro-optimising inspect._static_getmro and inspect._shadowed_dict leads to a significant improvement in the time taken to call isinstance() on a runtime-checkable protocol. (This is a useful benchmark, as it's a real-world use of inspect.getattr_static in a tight loop, that's found in the stdlib.)

Benchmark results on a0305c5:

Time taken for objects with a property: 3.34 Time taken for objects with a classvar: 3.08 Time taken for objects with an instance var: 12.15 Time taken for objects with no var: 15.38 Time taken for nominal subclass instances: 25.13 Time taken for registered subclass instances: 21.65 

Benchmark results with this PR:

Time taken for objects with a property: 2.82 Time taken for objects with a classvar: 2.85 Time taken for objects with an instance var: 8.62 Time taken for objects with no var: 14.01 Time taken for nominal subclass instances: 21.80 Time taken for registered subclass instances: 20.23 
Benchmark script:
importtimefromtypingimportProtocol, runtime_checkable@runtime_checkableclassHasX(Protocol): x: intclassFoo: @propertydefx(self) ->int: return42classBar: x=42classBaz: def__init__(self): self.x=42classEgg: ... classNominal(HasX): def__init__(self): self.x=42classRegistered: ... HasX.register(Registered) num_instances=500_000foos= [Foo() for_inrange(num_instances)] bars= [Bar() for_inrange(num_instances)] bazzes= [Baz() for_inrange(num_instances)] basket= [Egg() for_inrange(num_instances)] nominals= [Nominal() for_inrange(num_instances)] registereds= [Registered() for_inrange(num_instances)] defbench(objs, title): start_time=time.perf_counter() forobjinobjs: isinstance(obj, HasX) elapsed=time.perf_counter() -start_timeprint(f"{title}: {elapsed:.2f}") bench(foos, "Time taken for objects with a property") bench(bars, "Time taken for objects with a classvar") bench(bazzes, "Time taken for objects with an instance var") bench(basket, "Time taken for objects with no var") bench(nominals, "Time taken for nominal subclass instances") bench(registereds, "Time taken for registered subclass instances")

@AlexWaygoodAlexWaygood added type-feature A feature request or enhancement performance Performance or resource usage stdlib Standard Library Python modules in the Lib/ directory 3.12 only security fixes labels Apr 2, 2023
@AlexWaygoodAlexWaygood requested a review from carljmApril 2, 2023 17:05
@AlexWaygood
Copy link
MemberAuthor

I can find one significant bug report open regarding inspect.getattr_static:

Having studied the bug report, I don't think this patch impacts the bug (or any possible solutions to the bug) either way.

@Eclips4
Copy link
Member

Hello, Alex!
Just out of curiosity, why do you use a x.__dict__ instead of vars(x), it's related to a performance issues?

@AlexWaygood
Copy link
MemberAuthor

Just out of curiosity, why do you use a x.__dict__ instead of vars(x), it's related to a performance issues?

There's not really any difference between the two; it's just a matter of taste. Some people consider vars(x) to be "more Pythonic" than accessing the __dict__ variable directly, but I don't really have a strong preference either way.

In this case, the main reason I'm accessing it via .__dict__ is just because that's what the existing code already does, and I don't see any reason to change the style here. It's also arguably more explicit in this case -- but again, that's just a subjective opinion about code readability, really.

Copy link
Member

@sobolevnsobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several things I can think of to speed up getattr_static.

  1. Change _check_instance to be
def_check_instance(obj, attr): try: instance_dict=object.__getattribute__(obj, "__dict__") exceptAttributeError: return_sentinelreturndict.get(instance_dict, attr, _sentinel)
  1. Here type is called twice:
if (_check_class(type(klass_result), '__get__') isnot_sentineland_check_class(type(klass_result), '__set__') isnot_sentinel):

Ideally we can also call _check_class once with several attributes 🤔
Because in this case it will allow us to decrease the amount of _static_mro calls.

I think that a variable might be a bit faster

@AlexWaygood
Copy link
MemberAuthor

There are several things I can think of to speed up getattr_static.

Those sound great! My instinct is to tackle them in a separate PR, so that each change can be evaluated and benchmarked independently. Would you like to submit a PR?

@sobolevn
Copy link
Member

Yes, I will do it tomorrow 👍

@picnixz
Copy link
Member

picnixz commented Apr 3, 2023

Would it be ok to have _dunder_dict_descriptor_get = type.__dict__["__dict__"].__get__ instead of _dunder_dict_descriptor = type.__dict__["__dict__"] in order to minimize the number of accesses to __get__ when iterating over the classes ?

@AlexWaygood
Copy link
MemberAuthor

Would it be ok to have _dunder_dict_descriptor_get = type.__dict__["__dict__"].__get__ instead of _dunder_dict_descriptor = type.__dict__["__dict__"] in order to minimize the number of accesses to __get__ when iterating over the classes ?

I considered it, but felt like it would make it significantly less readable (_dunder_dict_descriptor already isn't a great name, and _dunder_dict_descriptor_get feels even worse). Happy to reconsider if others agree that we should go that way, though.

@picnixz
Copy link
Member

_dunder_dict_descriptor already isn't a great name, and _dunder_dict_descriptor_get feels even worse

IMO, since we are already micro-optimizing an internal helper of an internal function, we may not necessarily need to be exact so perhaps we can name it _dunder_dict_of ?

@AlexWaygood
Copy link
MemberAuthor

AlexWaygood commented Apr 3, 2023

IMO, since we are already micro-optimizing an internal helper of an internal function, we may not necessarily need to be exact so perhaps we can name it _dunder_dict_of ?

That's a decent name. Or maybe _get_dunder_dict_of_class.

I'll make the change — thanks!

@AlexWaygood
Copy link
MemberAuthor

AlexWaygood commented Apr 3, 2023

I pushed the change and updated the benchmark results in the PR description (it's maybe a teeny tiny bit faster than it was, but not by much)

@picnixz
Copy link
Member

it's maybe a teeny tiny bit faster than it was, but not by much

I think the impact will be clearer when we have a complicated inheritance diagram (e.g., large projects with abstract interfaces and/or mixins here and there).

Copy link
Member

@carljmcarljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@AlexWaygoodAlexWaygood merged commit 264c00a into python:mainApr 5, 2023
@AlexWaygoodAlexWaygood deleted the microoptimise-getattr_static-helpers branch April 5, 2023 07:27
@AlexWaygood
Copy link
MemberAuthor

There are several things I can think of to speed up getattr_static.

  1. Change _check_instance to be
def_check_instance(obj, attr): try: instance_dict=object.__getattribute__(obj, "__dict__") exceptAttributeError: return_sentinelreturndict.get(instance_dict, attr, _sentinel)
  1. Here type is called twice:
if (_check_class(type(klass_result), '__get__') isnot_sentineland_check_class(type(klass_result), '__set__') isnot_sentinel):

Ideally we can also call _check_class once with several attributes 🤔 Because in this case it will allow us to decrease the amount of _static_mro calls.

I think that a variable might be a bit faster

I tried these out locally, but unfortunately I can't measure any speedup from them :/

It's possible I'm not using the right benchmark to show a speedup, however -- happy to be proven wrong if you can get a speedup somewhere!

gaogaotiantian pushed a commit to gaogaotiantian/cpython that referenced this pull request Apr 8, 2023
warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.12only security fixesperformancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

@AlexWaygood@Eclips4@sobolevn@picnixz@carljm@bedevere-bot