Skip to content

Conversation

@brandtbucher
Copy link
Member

@brandtbucherbrandtbucher commented Feb 12, 2025

@pitrou pointed out that the JIT's stencils are bloated with zeroed bytes. Since we request fresh pages of memory for JIT code, it's guaranteed to be zeroed anyways, so we can save space in the file and operations at runtime by eliding the writes where appropriate.

Here's a before-and-after for one of our most common uops, _CHECK_VALIDITY:

voidemit__CHECK_VALIDITY( unsigned char*code, unsigned char*data, _PyExecutorObject*executor, const_PyUOpInstruction*instruction, jit_state*state){// // _CHECK_VALIDITY.o: file format elf64-x86-64// // Disassembly of section .text:// // 0000000000000000 <_JIT_ENTRY>:// 0: 48 8b 05 00 00 00 00 movq (%rip), %rax # 0x7 <_JIT_ENTRY+0x7>// 0000000000000003: R_X86_64_REX_GOTPCRELX _JIT_EXECUTOR-0x4// 7: f6 40 22 01 testb $0x1, 0x22(%rax)// b: 75 06 jne 0x13 <_JIT_ENTRY+0x13>// d: ff 25 00 00 00 00 jmpq *(%rip) # 0x13 <_JIT_ENTRY+0x13>// 000000000000000f: R_X86_64_GOTPCRELX _JIT_JUMP_TARGET-0x4// 13: ff 25 00 00 00 00 jmpq *(%rip) # 0x19 <_JIT_ENTRY+0x19>// 0000000000000015: R_X86_64_GOTPCRELX _JIT_CONTINUE-0x4constunsigned charcode_body[19] ={0x48, 0x8b, 0x05, 0x00, 0x00, 0x00, 0x00, 0xf6, 0x40, 0x22, 0x01, 0x75, 0x06, 0xff, 0x25, 0x00, 0x00, 0x00, 0x00, }; // 0: EXECUTOR// 8: JUMP_TARGETconstunsigned chardata_body[16] ={0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, }; memcpy(data, data_body, sizeof(data_body)); patch_64(data+0x0, (uintptr_t)executor); patch_64(data+0x8, state->instruction_starts[instruction->jump_target]); memcpy(code, code_body, sizeof(code_body)); patch_x86_64_32rx(code+0x3, (uintptr_t)data+-0x4); patch_x86_64_32rx(code+0xf, (uintptr_t)data+0x4)}
voidemit__CHECK_VALIDITY( unsigned char*code, unsigned char*data, _PyExecutorObject*executor, const_PyUOpInstruction*instruction, jit_state*state){// // _CHECK_VALIDITY.o: file format elf64-x86-64// // Disassembly of section .text:// // 0000000000000000 <_JIT_ENTRY>:// 0: 48 8b 05 00 00 00 00 movq (%rip), %rax # 0x7 <_JIT_ENTRY+0x7>// 0000000000000003: R_X86_64_REX_GOTPCRELX _JIT_EXECUTOR-0x4// 7: f6 40 22 01 testb $0x1, 0x22(%rax)// b: 75 06 jne 0x13 <_JIT_ENTRY+0x13>// d: ff 25 00 00 00 00 jmpq *(%rip) # 0x13 <_JIT_ENTRY+0x13>// 000000000000000f: R_X86_64_GOTPCRELX _JIT_JUMP_TARGET-0x4// 13: ff 25 00 00 00 00 jmpq *(%rip) # 0x19 <_JIT_ENTRY+0x19>// 0000000000000015: R_X86_64_GOTPCRELX _JIT_CONTINUE-0x4constunsigned charcode_body[19] ={0x48, 0x8b, 0x05, 0x00, 0x00, 0x00, 0x00, 0xf6, 0x40, 0x22, 0x01, 0x75, 0x06, 0xff, 0x25, }; // 0: EXECUTOR// 8: JUMP_TARGETpatch_64(data+0x0, (uintptr_t)executor); patch_64(data+0x8, state->instruction_starts[instruction->jump_target]); memcpy(code, code_body, sizeof(code_body)); patch_x86_64_32rx(code+0x3, (uintptr_t)data+-0x4); patch_x86_64_32rx(code+0xf, (uintptr_t)data+0x4)}

@brandtbucherbrandtbucher added skip news interpreter-core (Objects, Python, Grammar, and Parser dirs) build The build process and cross-build topic-JIT labels Feb 12, 2025
@brandtbucherbrandtbucher self-assigned this Feb 12, 2025
@bedevere-appbedevere-appbot mentioned this pull request Feb 12, 2025
@pitrou
Copy link
Member

Since we request fresh pages of memory for JIT code, it's guaranteed to be zeroed anyways, so we can save space in the file and operations at runtime by eliding the writes where appropriate.

Note that even without that property, you could simply have issued a memset instead of copying from a statically-allocated area of zeros :)

Here's a before-and-after for one of our most common uops, _CHECK_VALIDITY:

It seems strange to have a dedicated µop doing just this :) Is there a documentation for µops somewhere?

@brandtbucher
Copy link
MemberAuthor

brandtbucher commented Feb 12, 2025

It seems strange to have a dedicated µop doing just this :)

Does it? The role of this uop is to quickly check a single bit of state to check the our optimizer's assumptions hold. This can happen in lots of different places (a single Py_DECREF can change the world), so it helps to have a small check for it that can be put anywhere.

Is there a documentation for µops somewhere?

The general format and approach is documented in InternalDocs/jit.md. The individual uops aren't documented publicly, since they're a very unstable, low level implementation detail of an experimental feature. If there's a real need to internally document each of the 296 uops we currently have, we can probably find the time to do it. But most of them are either simple enough to follow (like type or dictionary version checks), or are identical to a full bytecode instruction that's already documented.

Copy link
Member

@savannahostrowskisavannahostrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smaller stencils 🎉

@TeamSpen210
Copy link

In the stripped version, code_body is still set to have the original length. Looks like the format string wasn't updated?

@brandtbucher
Copy link
MemberAuthor

Yeah, that's expected. There are places where we use sizeof(code_body), and those are a bit more disruptive to change. I felt it wasn't worth it... the real wins come from saving space in the file, and removing entirememcpy calls.

@brandtbucherbrandtbucher merged commit 05e89c3 into python:mainFeb 13, 2025
66 checks passed
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buildThe build process and cross-buildinterpreter-core(Objects, Python, Grammar, and Parser dirs)skip newstopic-JIT

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@brandtbucher@pitrou@TeamSpen210@savannahostrowski@encukou