Skip to content

Conversation

@sweeneyde
Copy link
Member

@sweeneydesweeneyde commented Apr 1, 2022

Most code won't do y = L.append(x) or whatnot, so PRECALL_NO_KW_LIST_APPEND is almost always followed by POP_TOP. We can verify at specialization time.

This saves a Py_INCREF(Py_None), a SET_TOP(Py_None), and POP_TOP's Py_DECREF(POP()); DISPATCH();.

Some microbenchmarks:

frompyperfimportRunner, perf_counterdefbench_append(loops, length): src=list(map(float, range(length))) arr= [] t0=perf_counter() foriinrange(loops): arr.clear() forxinsrc: arr.append(x) returnperf_counter() -t0defbench_append_less_gc(loops, length): src=list(map(float, range(length))) out= [None] *loopst0=perf_counter() foriinrange(loops): arr= [] forxinsrc: arr.append(x) out[i] =arrreturnperf_counter() -t0runner=Runner() fornin [100, 1_000, 10_000, 100_000]: runner.bench_time_func(f"append {n}", bench_append, n, inner_loops=n) runner.bench_time_func(f"append-less-gc {n}", bench_append_less_gc, n, inner_loops=n)

From GCC, --enable-optimizations, --with-lto:

- append 100000: 14.9 ns +- 0.3 ns -> 13.3 ns +- 0.4 ns: 1.12x faster - append 10000: 15.1 ns +- 0.3 ns -> 13.6 ns +- 0.5 ns: 1.11x faster - append-less-gc 100000: 16.4 ns +- 0.5 ns -> 14.9 ns +- 0.4 ns: 1.10x faster - append 1000: 15.6 ns +- 0.3 ns -> 14.2 ns +- 0.3 ns: 1.09x faster - append 100: 18.9 ns +- 0.6 ns -> 17.3 ns +- 0.6 ns: 1.09x faster - append-less-gc 100: 27.4 ns +- 1.1 ns -> 25.2 ns +- 1.2 ns: 1.09x faster - append-less-gc 10000: 19.2 ns +- 0.3 ns -> 17.8 ns +- 0.2 ns: 1.08x faster - append-less-gc 1000: 22.0 ns +- 0.6 ns -> 20.8 ns +- 0.3 ns: 1.06x faster Geometric mean: 1.09x faster 

https://bugs.python.org/issue47009

@markshannon
Copy link
Member

Looks good. I'm a bit wary of specialized superinstructions, but this seems solid.
I can imagine cases where list.append() wouldn't be followed by a POP_TOP, but they are contrived and highly unlikely.

@markshannonmarkshannon merged commit 6c6e040 into python:mainApr 5, 2022
@tiran
Copy link
Member

tiran commented Apr 5, 2022

The assert is failing on s390x Fedora buildbot https://buildbot.python.org/all/#/builders/232/builds/524

_bootstrap_python: Python/ceval.c:5045: _PyEval_EvalFrameDefault: Assertion `next_instr[-1] == POP_TOP' failed. make: *** [Makefile:1204: Python/frozen_modules/io.h] Aborted (core dumped) 

@markshannon
Copy link
Member

Strange. The bytecode is exactly the same on all platforms.

@sweeneydesweeneyde deleted the listappend_pop branch April 5, 2022 22:00
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@sweeneyde@markshannon@tiran@the-knights-who-say-ni@bedevere-bot