Skip to content

Conversation

@markshannon
Copy link
Member

@markshannonmarkshannon commented Jan 3, 2025

This PR:

  • Changes the POP_TOP and the end of a for loop to POP_ITER (it pops the iterator)
  • Removes the NOT_TAKEN following FOR_ITER and instruments the FOR_ITER and POP_ITER instead
  • Fixes handling of EXTENDED_ARGS in co_branches().

The last one isn't strictly related to the issue, but came up while fixing the main issue.

Changing the instrumentation pattern has two effects:

  • For uninstrumented code, there is one less instruction dispatch (the NOT_TAKEN) per iteration, at the cost of one additional dispatch when the loop is exited.
  • For instrumented code, once the non-exit branch is DISABLEd, the FOR_ITER can be specialized and jitted.

Performance is neutral on the standard benchmarks for tier1, as expected.

Comment on lines 908 to 910
code_offset_to_line(code, src)-base,
code_offset_to_line(code, left)-base,
code_offset_to_line(code, right)-base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
code_offset_to_line(code, src)-base,
code_offset_to_line(code, left)-base,
code_offset_to_line(code, right)-base
code_offset_to_line(code, src)-base,
code_offset_to_line(code, left)-base,
code_offset_to_line(code, right)-base,

tier1 inst(INSTRUMENTED_END_FOR, (receiver, value -- receiver)){
/* Need to create a fake StopIteration error here,
* to conform to PEP 380 */
//* to conform to PEP 380 ED_NO*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

* This has the benign side effect that if value is
* finalized it will see the location as the FOR_ITER's.
*/
frame->instr_ptr = prev_instr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a precedent for this? Seems like the kind of thing that can cause confusion/problems later on.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tier 2 code generator doesn't like it either. I think I'll need to add a new annotation "no_saved_ip" to tell the code generator not to update frame->instr_ptr.

Any suggestions for a better name than "no_saved_ip"?

@iritkatriel
Copy link
Member

I was wondering if we couldn't, instead of putting in NOT_TAKEN instructions, have a static table on the code object with the fallthrough locations. They are only needed for instrumentation, so could be processed only when instrumentation is enabled.

@nedbat
Copy link
Member

Trying this PR compared to fa985be, more of my tests fail (on a work-in-progress branch to account for BRANCH_LEFT etc). I'll debug my code, but I wish there were a way to get a concrete plan to work from.

@markshannon
Copy link
MemberAuthor

I was wondering if we couldn't, instead of putting in NOT_TAKEN instructions, have a static table on the code object with the fallthrough locations. They are only needed for instrumentation, so could be processed only when instrumentation is enabled.

I think that would be worse overall, for a few reasons:

  • We need to be able to disable the left and right events separately. Without a NOT_TAKEN instruction, each POP_JUMP... would need POP_JUMP...INSTRUMENTED, POP_JUMP...INSTRUMENTED_LEFT_ONLY, and POP_JUMP...INSTRUMENTED_RIGHT_ONLY.
    With the current 4 POP_JUMP variants that would be 12 instructions instead of the 6 we need with NOT_TAKEN.
  • Such a table would be a lot bigger than extra space taken by the NOT_TAKEN instructions
  • It wouldn't be significantly faster when not instrumented. The cost of NOT_TAKEN is very low as it normally skipped and is eliminated entirely in the jit.
  • It would be quite a lot slower when instrumented thanks to the table lookups.

@nedbat
Copy link
Member

nedbat commented Jan 3, 2025

Is this intended?

continue.py:

foriinrange(10): a=icontinue# 3a=99asserta==9# 5

Disassembled with this branch:

% python3 -m dis -O continue.py 0 0 RESUME 0 1 2 LOAD_NAME 0 (range) 4 PUSH_NULL 6 LOAD_SMALL_INT 10 8 CALL 1 16 GET_ITER L1: 18 FOR_ITER 5 (to L2) 22 STORE_NAME 1 (i) 2 24 LOAD_NAME 1 (i) 26 STORE_NAME 2 (a) 3 28 JUMP_BACKWARD 7 (to L1) 1 L2: 32 END_FOR 34 POP_ITER 5 36 LOAD_NAME 2 (a) 38 LOAD_SMALL_INT 9 40 COMPARE_OP 88 (bool(==)) 44 POP_JUMP_IF_TRUE 3 (to L3) 48 NOT_TAKEN 50 LOAD_COMMON_CONSTANT 0 (AssertionError) 52 RAISE_VARARGS 1 L3: 54 LOAD_CONST 0 (None) 56 RETURN_VALUE 

sys.monitoring events using run_sysmon.py:

3.14.0a3+ (heads/pr/128445:511733c91b3, Jan 3 2025, 10:10:48) [Clang 16.0.0 (clang-1600.0.26.6)] PY_START: continue.py@0 #0 LINE: continue.py #1 BRANCH_LEFT: continue.py@18->22 #1->1 LINE: continue.py #2 LINE: continue.py #3 JUMP: continue.py@28->18 #3->1 LINE: continue.py #1 BRANCH_RIGHT: continue.py@34->36 #1->5 ****** LINE: continue.py #5 BRANCH_RIGHT: continue.py@44->54 #5->5 PY_RETURN: continue.py@56 #5 

Is it right that the starred line has a BRANCH_RIGHT event from a POP_ITER instruction?

@markshannon
Copy link
MemberAuthor

Is it right that the starred line has a BRANCH_RIGHT event from a POP_ITER instruction?

No. It should be from the FOR_ITER. The problem was that the POP_ITER was being instrumented with INSTRUMENT_LINE which lost the previous instruction offset. I think I have that fixed now.

@nedbat
Copy link
Member

Two more examples of branch event surprises. run_sysmon.py is used to see the sys.monitoring events. These are using commit cbe6c7c from PR #128445.

ifdebug.py
forvaluein [True, False]: ifvalue: if__debug__: x=4else: x=6

Disassembled:

% python3 -m dis -O ifdebug.py 0 0 RESUME 0 1 2 LOAD_CONST 0 ((True, False)) 4 GET_ITER L1: 6 FOR_ITER 18 (to L3) 10 STORE_NAME 0 (value) 2 12 LOAD_NAME 0 (value) 14 TO_BOOL 22 POP_JUMP_IF_FALSE 6 (to L2) 26 NOT_TAKEN 3 28 NOT_TAKEN 4 30 LOAD_SMALL_INT 4 32 STORE_NAME 1 (x) 34 JUMP_BACKWARD 16 (to L1) 6 L2: 38 LOAD_SMALL_INT 6 40 STORE_NAME 1 (x) 42 JUMP_BACKWARD 20 (to L1) 1 L3: 46 END_FOR 48 POP_ITER 50 LOAD_CONST 1 (None) 52 RETURN_VALUE 

Events:

3.14.0a3+ (heads/pr/128445:cbe6c7c085c, Jan 4 2025, 05:39:38) [Clang 16.0.0 (clang-1600.0.26.6)] PY_START: ifdebug.py@0 #0 LINE: ifdebug.py #1 BRANCH_LEFT: ifdebug.py@6->10 #1->1 LINE: ifdebug.py #2 BRANCH_LEFT: ifdebug.py@22->28 #2->3 LINE: ifdebug.py #3 BRANCH_LEFT: ifdebug.py@28->30 #3->4 **** LINE: ifdebug.py #4 JUMP: ifdebug.py@34->6 #4->1 LINE: ifdebug.py #1 BRANCH_RIGHT: ifdebug.py@22->38 #2->6 LINE: ifdebug.py #6 JUMP: ifdebug.py@42->6 #6->1 BRANCH_RIGHT: ifdebug.py@6->50 #1->1 PY_RETURN: ifdebug.py@52 #1 

The starred line is a BRANCH event from a NOT_TAKEN instruction.

1176_async.py
importasyncioasyncdefasync_gen(): yield4asyncdefasync_test(): asyncforiinasync_gen(): print(i+8) asyncio.run(async_test())

Disassembled:

% python3 -m dis -O 1176_async.py 0 0 RESUME 0 1 2 LOAD_SMALL_INT 0 4 LOAD_CONST 0 (None) 6 IMPORT_NAME 0 (asyncio) 8 STORE_NAME 0 (asyncio) 3 10 LOAD_CONST 1 (<code object async_gen at 0x103781c50, file "1176_async.py", line 3>) 12 MAKE_FUNCTION 14 STORE_NAME 1 (async_gen) 6 16 LOAD_CONST 2 (<code object async_test at 0x1037c5b80, file "1176_async.py", line 6>) 18 MAKE_FUNCTION 20 STORE_NAME 2 (async_test) 10 22 LOAD_NAME 0 (asyncio) 24 LOAD_ATTR 6 (run) 44 PUSH_NULL 46 LOAD_NAME 2 (async_test) 48 PUSH_NULL 50 CALL 0 58 CALL 1 66 POP_TOP 68 LOAD_CONST 0 (None) 70 RETURN_VALUE Disassembly of <code object async_gen at 0x103781c50, file "1176_async.py", line 3>: 3 0 RETURN_GENERATOR 2 POP_TOP L1: 4 RESUME 0 4 6 LOAD_SMALL_INT 4 8 CALL_INTRINSIC_1 4 (INTRINSIC_ASYNC_GEN_WRAP) 10 YIELD_VALUE 0 12 RESUME 5 14 POP_TOP 16 LOAD_CONST 0 (None) 18 RETURN_VALUE -- L2: 20 CALL_INTRINSIC_1 3 (INTRINSIC_STOPITERATION_ERROR) 22 RERAISE 1 ExceptionTable: L1 to L2 -> L2 [0] lasti Disassembly of <code object async_test at 0x1037c5b80, file "1176_async.py", line 6>: 6 0 RETURN_GENERATOR 2 POP_TOP L1: 4 RESUME 0 7 6 LOAD_GLOBAL 1 (async_gen + NULL) 16 CALL 0 24 GET_AITER L2: 26 GET_ANEXT 28 LOAD_CONST 0 (None) L3: 30 SEND 3 (to L6) L4: 34 YIELD_VALUE 1 L5: 36 RESUME 3 38 JUMP_BACKWARD_NO_INTERRUPT 5 (to L3) L6: 40 END_SEND L7: 42 STORE_FAST 0 (i) 8 44 LOAD_GLOBAL 3 (print + NULL) 54 LOAD_FAST 0 (i) 56 LOAD_SMALL_INT 8 58 BINARY_OP 0 (+) 62 CALL 1 70 POP_TOP 72 JUMP_BACKWARD 25 (to L2) 7 L8: 76 CLEANUP_THROW L9: 78 JUMP_BACKWARD_NO_INTERRUPT 20 (to L6) L10: 80 END_ASYNC_FOR 82 LOAD_CONST 0 (None) 84 RETURN_VALUE -- L11: 86 CALL_INTRINSIC_1 3 (INTRINSIC_STOPITERATION_ERROR) 88 RERAISE 1 ExceptionTable: L1 to L2 -> L11 [0] lasti L2 to L4 -> L10 [1] L4 to L5 -> L8 [3] L5 to L7 -> L10 [1] L7 to L8 -> L11 [0] lasti L8 to L9 -> L10 [1] L10 to L11 -> L11 [0] lasti 

Events:

3.14.0a3+ (heads/pr/128445:cbe6c7c085c, Jan 4 2025, 05:39:38) [Clang 16.0.0 (clang-1600.0.26.6)] PY_START: 1176_async.py@0 #0 LINE: 1176_async.py #1 LINE: 1176_async.py #3 LINE: 1176_async.py #6 LINE: 1176_async.py #10 PY_START: 1176_async.py@4 #6 LINE: 1176_async.py #7 PY_START: 1176_async.py@4 #3 LINE: 1176_async.py #4 LINE: 1176_async.py #8 12 JUMP: 1176_async.py@72->26 #8->7 LINE: 1176_async.py #7 PY_RESUME: 1176_async.py@12 #4 PY_RETURN: 1176_async.py@18 #4 PY_RETURN: 1176_async.py@84 #7 PY_RETURN: 1176_async.py@70 #10 

Here there are no BRANCH events at all. I would expect some for the async for line, analogous to how synchronous for is evented.

@markshannon
Copy link
MemberAuthor

These examples are really helpful, but I don't think they are relevant to this PR.
Could you add these examples to the issue, otherwise they get lost.

The ifdebug.py example is a case of the NOT_TAKEN not being removed when the corresponding JUMP is removed. The 1176_async.py is a result of the weird way that async for is currently implemented. I've added tasks for both.

* the FOR_ITER as the previous instruction.
* This has the benign side effect that if value is
* finalized it will see the location as the FOR_ITER's.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just teach POP_ITER to look at prev_instr-1?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_instr is the previously executed instruction, which is probably not the previous instruction in the code.
There is no way for POP_ITER to tell where the jump came from if we overwrite frame->instr_ptr

@markshannonmarkshannon merged commit f826bec into python:mainJan 6, 2025
67 checks passed
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
@markshannonmarkshannon deleted the better-instrument-for-iter branch January 10, 2025 16:22
Yhg1s added a commit to Yhg1s/cpython that referenced this pull request Jan 21, 2025
@esc
Copy link
Contributor

esc commented Jun 24, 2025

The POP_ITER byetcode appears undocumented when looking at: https://docs.python.org/3.14/library/dis.html

Is this intentional or do you want me to raise an issue about this? Or has an issue perhaps been raised already?

facebook-github-bot pushed a commit to facebookincubator/cinderx that referenced this pull request Aug 28, 2025
Summary: See [GH-128445](python/cpython#128445). Reviewed By: alexmalyshev Differential Revision: D81142355 fbshipit-source-id: 33c7eb51bd64722229df88691619a3e9fcbede6b
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

@markshannon@iritkatriel@nedbat@esc