Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
GH-118036: Fix a bug with CALL_STAT_INC#117933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
gvanrossum commented Apr 16, 2024 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
We were under-counting calls in `_PyEvalFramePushAndInit` because the `CALL_STAT_INC` macro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted. This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).
mdboom left a comment • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how this is broken and why this change fixes it, so I'm marking as approved, but I agree it's "weird" / more convoluted than it needs to be. Moving the order of functions in ceval.c would result in less weirdness (probably just moving _PyEval_EvalFrameDefault to the bottom since that's where all the macro updates happen) but that would create a lot of churn.
Alternatively, what if you rename REAL_CALL_STAT_INC to CALL_STAT_INC_ALWAYS and then just use CALL_STAT_INC_ALWAYS directly from _PyEvalFramePushAndInit (which I think is the only call site impacted by this change). That would get rid of the weird dance of #undef and restoring just CALL_STAT_INC (but admittedly that would just replace it with another subtlety in _PyEvalFramePushAndInit).
Let's try a different fix instead.
This should fix the issue with CALL_STAT_INC in a cleaner way (even if the diff is much larger).
gvanrossum commented Apr 16, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Here's another version, where I moved the interpreter loop (and a small assortment of related stuff) to the end of the file. I ran Pystats on a single loop of the Richards benchmark, and found that most stats are approximately the same (not completely) except that "Frames pushed" is about 10% larger, which indicates that the fix works. (The diff is much more annoying to review, but I promise I just moved the stuff from the original lines 606 - 1120 to the bottom of the file, except I had to move |
gvanrossum commented Apr 17, 2024
Benchmark says speed and memory unchanged. |
markshannon commented Apr 17, 2024
I think the correct fix for In executor_cases.c.h |
gvanrossum commented Apr 17, 2024
Okay, I'll try that next. |
I'm going to try yet another approach.
gvanrossum commented Apr 17, 2024
The proof will be in the pudding. I'll fire off two benchmark runs, with pystats, one plain, one with Tier 2. (The JIT pystats ought to be similar but I don't want to wait.) |
gvanrossum commented Apr 17, 2024
Benchmark using Tier 1 only shows 36.8% more frames pushed, which is very close to what I measured with the first version of this PR, so I think that suggests this fixes that issue. Everything else I looked at is basically unchanged, suggesting I'm not breaking anything. Still waiting for the Tier 2 benchmark, will update when I see those numbers. |
gvanrossum commented Apr 17, 2024
Benchmark with Tier 2 poses a bit of a mystery, at least the pystats diff. Go to Call stats and open the details box.
@markshannon, @mdboom, @brandtbucher -- could there be some kind of double counting going on? Or is this an expected result? The only change from main is now that we don't undefine |
markshannon commented Apr 18, 2024 • edited by mdboom
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by mdboom
Uh oh!
There was an error while loading. Please reload this page.
I think the change is correct, even though the stats are probably still wrong. |
We were under-counting calls in
_PyEvalFramePushAndInitbecause theCALL_STAT_INCmacro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted (I had wanted to move the code around, but that would require moving something else around, and in the end I figured it was easier to tweak the macros @markshannon might disagree though?). This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).@mdboom can you review? This is one commit from my experiment about removing Tier 2 entirely (gh-117908).
To see the effect, look at these pystat diffs.