Skip to content

Conversation

@sweeneyde
Copy link
Member

@sweeneydesweeneyde commented Mar 14, 2022

@sweeneyde
Copy link
MemberAuthor

sweeneyde commented Mar 14, 2022

Benchmarks are good:

frompyperfimportRunner, perf_counterdefbench_listcomp(loops, length): src=list(map(float, range(length))) t0=perf_counter() foriinrange(loops): [xforxinsrc] returnperf_counter() -t0defbench_append(loops, length): src=list(map(float, range(length))) t0=perf_counter() foriinrange(loops): arr= [] forxinsrc: arr.append(x) returnperf_counter() -t0runner=Runner() fornin [100, 1_000, 10_000, 100_000]: runner.bench_time_func(f"listcomp {n}", bench_listcomp, n) runner.bench_time_func(f"append {n}", bench_append, n)

Results from GCC on WSL with --enable-optimizations --with-lto

Faster (8): - listcomp 10000: 118 us +- 2 us -> 92.6 us +- 1.5 us: 1.28x faster - listcomp 100000: 1.16 ms +- 0.02 ms -> 916 us +- 26 us: 1.27x faster - listcomp 1000: 12.3 us +- 0.2 us -> 9.89 us +- 0.41 us: 1.25x faster - listcomp 100: 1.59 us +- 0.03 us -> 1.32 us +- 0.05 us: 1.21x faster - append 100000: 1.69 ms +- 0.05 ms -> 1.45 ms +- 0.04 ms: 1.17x faster - append 10000: 168 us +- 4 us -> 145 us +- 3 us: 1.16x faster - append 1000: 17.4 us +- 0.3 us -> 15.2 us +- 0.6 us: 1.14x faster - append 100: 2.03 us +- 0.06 us -> 1.81 us +- 0.08 us: 1.12x faster Geometric mean: 1.20x faster 

@sweeneydesweeneyde requested a review from methaneMarch 14, 2022 05:16
@sweeneydesweeneyde added the performance Performance or resource usage label Mar 14, 2022
Copy link
Member

@markshannonmarkshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be simpler, and just as fast, by renaming app1 to _PyList_AppendTakeRef (with appropriate refcount adjustments).
No need for the inline functions.

PyAPI_FUNC(int)
_PyList_AppendTakeRefListResize(PyListObject*self, PyObject*newitem);

staticinlineint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an inline function.
Let the LTO pass decide if it wants to inline it.

app1(PyListObject*self, PyObject*v)
/* internal, used by _PyList_AppendTakeRef */
int
_PyList_AppendTakeRefListResize(PyListObject*self, PyObject*newitem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the extra function?
Converting app1 to _PyList_AppendTakeRef would seem to be enough.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something like that and the results are on the bpo issue, but it seems there is a significant benefit to having the separate as-small-as-possible function that always gets inlined as opposed to a slightly longer version that we let the compiler figure out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the compiler isn't inlining the function because it isn't hot enough.
PRECALL_NO_KW_LIST_APPEND and LIST_APPEND are only 0.3% of all instructions executed, so maybe it is best not to inline.

Do you have benchmark numbers for the standard suite?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, IMO there is a relative lack of list comprehensions in the pyperformance suite.

@sweeneyde
Copy link
MemberAuthor

@markshannon Since pyperformance results were negligible and this has a significant speedup on operations that live in hot loops (albeit not in pyperformance), is it okay if I merge this?

@markshannon
Copy link
Member

Sorry, this dropped off my radar.

@markshannonmarkshannon merged commit a0ea7a1 into python:mainApr 1, 2022
@sweeneydesweeneyde deleted the listappend branch April 1, 2022 16:13
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performancePerformance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@sweeneyde@markshannon@kumaraditya303@the-knights-who-say-ni@bedevere-bot