Skip to content

Conversation

@webknjaz
Copy link
Member

@webknjazwebknjaz commented Sep 10, 2023

This is a refactoring change that aims to simplify ensurepip. Before it, this module had legacy infrastructure that made an assumption that ensurepip would be provisioning more then just a single wheel. That assumption is no longer true since [1][2][3].

In this change, the improvement is done around removing unnecessary loops and supporting structures to change the assumptions to expect only the bundled or replacement pip wheel.

This is a spin-off of #12791.

@webknjazwebknjazforce-pushed the maintenance/ensurepip-single-wheel branch 2 times, most recently from b9fe227 to 6b2ce22CompareSeptember 10, 2023 23:33
@webknjazwebknjaz changed the title Get rid of the ensurepip infra for many wheelsGH-109245: Get rid of the ensurepip infra for many wheelsSep 10, 2023
@webknjazwebknjazforce-pushed the maintenance/ensurepip-single-wheel branch from 71911de to bf909c0CompareSeptember 10, 2023 23:36
@webknjaz
Copy link
MemberAuthor

@AA-TurnerAA-Turner changed the title GH-109245: Get rid of the ensurepip infra for many wheelsGH-80789: Get rid of the ensurepip infra for many wheelsSep 16, 2023
@AA-Turner
Copy link
Member

I've taken the liberty of pushing two changes to your branch; feel free to revert in whole or part.

Both look to simplify what ensurepip is doing internally -- e.g. with only pip, having a _Package type doesn't really make sense any-more -- similarly with caching the output, which is fast to compute.

I've also removed type annotations, as these are ignored and we shouldn't introduce more (tomllib and importlib.resources are special cases).

I think there are further simplifications we could make, but I thought this was a reasonable middle-ground -- let me know your thoughts!

A

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I would prefer to get a second review on this one.

Copy link
Member

@pfmoorepfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, but it doesn't block this - feel free to leave it if you think your version is better.

# of the same package, but don't attempt to implement correct version
# comparison since this case should not happen.
filenames=sorted(filenames)
filenames=sorted(filenames, reverse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reverse? I guess it gives better odds of getting the correct version, but as the comment above says, we're not doing version comparison because it should never be needed.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted this in the commit message but not a PR comment; sorry. The implementation of this function now short-circuits and returns the first match, but that means we need to reverse the order as previously the final match would be returned, and we want to preserve that behaviour. We retain no guarantees about finding the correct version.

@webknjaz
Copy link
MemberAuthor

I've taken the liberty of pushing two changes to your branch; feel free to revert in whole or part.

Both look to simplify what ensurepip is doing internally -- e.g. with only pip, having a _Package type doesn't really make sense any-more -- similarly with caching the output, which is fast to compute.

I've also removed type annotations, as these are ignored and we shouldn't introduce more (tomllib and importlib.resources are special cases).

I think annotations contribute to maintainability.

I think there are further simplifications we could make, but I thought this was a reasonable middle-ground -- let me know your thoughts!

I was specifically avoiding any refactoring in this PR, trying to make it as small as possible and keep the potential conflicts with the rest of related PRs at minimum. I don't like some of the changes but if this gets it merged faster so be it.

@webknjaz
Copy link
MemberAuthor

FreeBSD failing check reason:

⚠️ Failed to start an instance: FAILED_PRECONDITION: Monthly free compute limit exceeded!

@pfmoore
Copy link
Member

I think annotations contribute to maintainability.

I'm pretty sure there's still a policy that we don't use type annotations in the stdlib.

@webknjaz
Copy link
MemberAuthor

I'm pretty sure there's still a policy that we don't use type annotations in the stdlib.

Interesting.. https://devguide.python.org/search/?q=type+annotations&check_keywords=yes&area=default doesn't return anything related (nor does using Google search against that site).

@AA-Turner
Copy link
Member

doesn't return anything related

See https://discuss.python.org/t/21487; #108125 (comment). Quoting Brett: "It's codified based on discussions among the core developers that we have all agreed to over the years"

A

# of the same package, but don't attempt to implement correct version
# comparison since this case should not happen.
filenames=sorted(filenames)
filenames=sorted(filenames, reverse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted this in the commit message but not a PR comment; sorry. The implementation of this function now short-circuits and returns the first match, but that means we need to reverse the order as previously the final match would be returned, and we want to preserve that behaviour. We retain no guarantees about finding the correct version.

@webknjaz
Copy link
MemberAuthor

@AA-Turner so I tried something which resulted in some simplification. I used pathlib but also made functions to return paths instead of complex structures with data that is never used together by the callers. I made sure that the coverage of the changed parts is at 100%, while making it less branchy...

@webknjaz
Copy link
MemberAuthor

I have made the requested changes; please review again

webknjazand others added 14 commits January 25, 2024 14:33
This is a refactoring change that aims to simplify ``ensurepip``. Before it, this module had legacy infrastructure that made an assumption that ``ensurepip`` would be provisioning more then just a single wheel. That assumption is no longer true since [[1]][[2]][[3]]. In this change, the improvement is done around removing unnecessary loops and supporting structures to change the assumptions to expect only the bundled or replacement ``pip`` wheel. [1]: python@ece20db [2]: python#101039 [3]: python#95299
This change is intended to make it clear that the helper now only returns one package and no more. Co-Authored-By: vstinner@python.org
- Exit early if there are no files in the directory - Return a match early by iterating in reverse order - Merge filename checks - Inline the path variable - Remove type annotations
- Return a dict with consistent fields - Remove caching - Remove type annotations - Leverage the known wheel package dir value to calculate full paths
It provides us with the ability to write simpler high-level logic that is easier to understand. As a side effect, it became possible to make the code less branchy.
It was returning bytes on FreeBSD which was incorrectly injected into the Python code snippet executed in a subprocess and had a b-preffix.
Some code comments and test function names were still referring to the removed function name. Not anymore!
This script is separate and is only used in CI as opposed to runtime.
Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>
Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>
It was tripping the linters and became unnecessary after hardcoding the pip wheel filename prefix in the string.
@webknjazwebknjazforce-pushed the maintenance/ensurepip-single-wheel branch from 45f9944 to 2472d87CompareJanuary 25, 2024 13:33
@webknjaz
Copy link
MemberAuthor

@pradyunsg there was a merge conflict that I solved with rebase. No other changes made.

Copy link
Member

@pradyunsgpradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change, but LGTM otherwise!

@webknjaz
Copy link
MemberAuthor

@pradyunsg 🛎 the CI remains green after your branch sync FYI.

@pradyunsgpradyunsg merged commit 9639043 into python:mainJan 30, 2024
@webknjaz
Copy link
MemberAuthor

@AA-Turner@pradyunsg@vstinner apply the backport labels?

@vstinner
Copy link
Member

I don't think that such refactoring change should be backported to stable branches. Only bugfixes.

aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
…ython#109245) Co-authored-by: vstinner@python.org Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com> Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com>
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

@webknjaz@AA-Turner@pfmoore@pradyunsg@vstinner@bedevere-bot