GH-113464: Generate a more efficient JIT#118512

brandtbucher · 2024-05-02T15:51:04Z

This breaks up the JIT into smaller functions, reduces a lot of branching in hot inner loops, and generally makes the C code cleaner (and probably faster).

Currently, we generate declarative structures at build time that we then loop over in order to emit the desired machine code at runtime. For example, the _STORE_FAST stencil looks like this:

Details

staticconstunsigned char_STORE_FAST_code_body[61] ={0x50, 0x48, 0x8b, 0x45, 0xf8, 0x48, 0x83, 0xc5, 0xf8, 0x0f, 0xb7, 0x0d, 0x00, 0x00, 0x00, 0x00, 0x49, 0x8b, 0x7c, 0xcd, 0x48, 0x49, 0x89, 0x44, 0xcd, 0x48, 0x48, 0x85, 0xff, 0x74, 0x0f, 0x48, 0x8b, 0x07, 0x85, 0xc0, 0x78, 0x08, 0x48, 0xff, 0xc8, 0x48, 0x89, 0x07, 0x74, 0x07, 0x58, 0xff, 0x25, 0x00, 0x00, 0x00, 0x00, 0xff, 0x15, 0x00, 0x00, 0x00, 0x00, 0x58, }; staticconstHole_STORE_FAST_code_holes[4] ={{0xc, HoleKind_R_X86_64_GOTPCREL, HoleValue_DATA, NULL, -0x4},{0x31, HoleKind_R_X86_64_GOTPCRELX, HoleValue_DATA, NULL, 0x4},{0x37, HoleKind_R_X86_64_GOTPCRELX, HoleValue_DATA, NULL, 0xc}, }; staticconstunsigned char_STORE_FAST_data_body[25] ={0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, }; staticconstHole_STORE_FAST_data_holes[4] ={{0x0, HoleKind_R_X86_64_64, HoleValue_OPARG, NULL, 0x0},{0x8, HoleKind_R_X86_64_64, HoleValue_CONTINUE, NULL, 0x0},{0x10, HoleKind_R_X86_64_64, HoleValue_ZERO, &_Py_Dealloc, 0x0}, };

This very general approach means that we have a lot of complex logic in our hot inner loop to decode instructions and set up values for patching that may not even be needed. It's also very branchy, since we're essentially "interpreting" the array of holes for each instruction.

With this PR, jit_stencils.h instead contains the following function:

Details

voidemit__STORE_FAST( unsigned char*code, unsigned char*data, _PyExecutorObject*executor, const_PyUOpInstruction*instruction, uintptr_tinstruction_starts[]){constunsigned charcode_body[60] ={0x50, 0x48, 0x8b, 0x45, 0xf8, 0x48, 0x83, 0xc5, 0xf8, 0x0f, 0xb7, 0x0d, 0x00, 0x00, 0x00, 0x00, 0x49, 0x8b, 0x7c, 0xcd, 0x48, 0x49, 0x89, 0x44, 0xcd, 0x48, 0x48, 0x85, 0xff, 0x74, 0x0f, 0x48, 0x8b, 0x07, 0x85, 0xc0, 0x78, 0x08, 0x48, 0xff, 0xc8, 0x48, 0x89, 0x07, 0x74, 0x07, 0x58, 0xff, 0x25, 0x00, 0x00, 0x00, 0x00, 0xff, 0x15, 0x00, 0x00, 0x00, 0x00, 0x58, }; constunsigned chardata_body[24] ={0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, }; memcpy(data, data_body, sizeof(data_body)); patch_64(data+0x0, instruction->oparg); patch_64(data+0x8, (uintptr_t)code+sizeof(code_body)); patch_64(data+0x10, (uintptr_t)&_Py_Dealloc); memcpy(code, code_body, sizeof(code_body)); patch_32r(code+0xc, (uintptr_t)data+-0x4); patch_x86_64_32rx(code+0x31, (uintptr_t)data+0x4); patch_x86_64_32rx(code+0x37, (uintptr_t)data+0xc)}

This function is called directly to emit the machine code for every _STORE_FAST instruction, and hardcodes the logic for all of the necessary copies and patches. The result is one indirect call, no unnecessary branching, and (in my opinion) cleaner code, since a lot of the tricky logic is now hidden away in generated files.

I know this is right before feature freeze, but I'd really like to get this in 3.13 since it will make backporting any fixes much easier. It doesn't change the actual jitted code in any way.

Note to reviewers: the diff is a bit messy, so it may make more sense to compare the before-vs-after files side-by-side instead.

Issue: JIT Compilation #113464

brandtbucher · 2024-05-02T15:52:11Z

@savannahostrowski, I'd love to get your review of this if you have a few cycles.

savannahostrowski

A couple of comments and questions but after sitting and reading through this code a bunch over the last week or two, I'm excited about how much more readable this will get with this change! 💆‍♀️

Python/jit.c

Tools/jit/_stencils.py

savannahostrowski · 2024-05-02T23:26:09Z

Tools/jit/_writer.py

 """Yield a JIT compiler line-by-line as a C header file."""
-yieldfrom_dump_header()
-foropname, groupingroups.items():
+foropname, groupinsorted(groups.items()):


Is there a reason that this needs to be sorted?

Nope, I just like it that way (if you couldn't tell by now). ;)

Tools/jit/_stencils.py

savannahostrowski

Thanks for adding in the comment about the naming conventions - I think that helps! Otherwise, this looks pretty solid to me (barring some Windows CI failures). Lots of moving things into function but it's a whole lot more readable! 🎉

brandtbucher · 2024-05-03T23:40:53Z

Windows JIT CI fixed in GH-118564.

brandtbucherand others added 16 commits May 1, 2024 15:15

Replace stencils with dedicated writer functions
2901caf

Generate patching logic
f30fa64

Cleanup
23e211c

uint64_t -> uintptr_t
431fbed

uintptr_t -> uint64_t
3e6b25c

Linting
82030c8

Restore AArch64 pair folding
236af82

Cleanup
3b7e693

Add missing relocations
fbb97fc

Fix AArch64 folds
c40bb34

Dedent
bd570b5

Use a single array of structs
b2fd9d2

Move C initializer formation to StencilGroup
7aa12a2

Add comment on why data is first
9ec64ac

Silence warnings
eb0826f

Add missing space
a04d7f8

brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) build The build process and cross-build labels May 2, 2024

brandtbucher self-assigned this May 2, 2024

bedevere-appbot added the awaiting core review label May 2, 2024

bedevere-appbot mentioned this pull request May 2, 2024
JIT Compilation #113464
Closed

brandtbucher requested a review from markshannon May 2, 2024 15:51

brandtbucher added the skip news label May 2, 2024

savannahostrowski reviewed May 2, 2024
View reviewed changes

brandtbucher added 2 commits May 3, 2024 14:30

Clarify which patch functions are relaxing (and what that means)
b919fcc

Exaplain why GOT is commented out
46adf09

savannahostrowski approved these changes May 3, 2024
View reviewed changes

brandtbucher merged commit 1b7e5e6 into python:mainMay 3, 2024

bedevere-appbot removed the awaiting core review label May 3, 2024

SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
pythonGH-113464: Generate a more efficient JIT (pythonGH-118512)
1493daa

brandtbucher added the topic-JIT label May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-113464: Generate a more efficient JIT#118512

GH-113464: Generate a more efficient JIT #118512

Uh oh!

brandtbucher commented May 2, 2024•
edited
Loading

Uh oh!

brandtbucher commented May 2, 2024

Uh oh!

savannahostrowski left a comment •
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

savannahostrowskiMay 2, 2024

Uh oh!

brandtbucherMay 3, 2024

Uh oh!

Uh oh!

Uh oh!

savannahostrowski left a comment •
edited
Loading

Uh oh!

brandtbucher commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

GH-113464: Generate a more efficient JIT#118512

GH-113464: Generate a more efficient JIT #118512

Uh oh!

Conversation

brandtbucher commented May 2, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandtbucher commented May 2, 2024

Uh oh!

savannahostrowski left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

savannahostrowskiMay 2, 2024

Choose a reason for hiding this comment

Uh oh!

brandtbucherMay 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

savannahostrowski left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brandtbucher commented May 2, 2024•
edited
Loading

savannahostrowski left a comment •
edited
Loading

savannahostrowski left a comment •
edited
Loading