⚡️ Don't memoize `SequenceSet#string` on normalized sets#554

nevans · 2025-11-22T19:24:37Z

Not duplicating the data in @tuples and @string saves memory. For large sequence sets, this memory savings can be substantial.

But this is a tradeoff: it saves time when the string is not used, but uses more time when the string is used more than once. Working with set operations can create many ephemeral sets, so avoiding unintentional string generation can save a lot of time.

Also, by quickly scanning the entries after a string is parsed, we can bypass the merge algorithm for normalized strings. But this does cause a small penalty for non-normalized strings.

Please note: It is still possible to create a memoized string on a normalized SequenceSet with #append. For example: create a monotonically sorted SequenceSet with non-normal final entry, then call #append with an adjacently following entry. #append coalesces the final entry and converts it into normal form, but doesn't check whether the preceding entries of the SequenceSet are normalized.

Benchmarks

Results from benchmarks/sequence_set-normalize.yml

There is still room for improvement here, because #normalize generates the normalized string for comparison rather than just reparse the string.

 normal local: 19938.9 i/s v0.5.12: 2988.7 i/s - 6.67x slower frozen and normal local: 17011413.5 i/s v0.5.12: 3574.4 i/s - 4759.30x slower unsorted local: 19434.9 i/s v0.5.12: 2957.5 i/s - 6.57x slower abnormal local: 19835.9 i/s v0.5.12: 3037.1 i/s - 6.53x slower

Results from benchmarks/sequence_set-new.yml

Note that this benchmark doesn't use SequenceSet::new; it uses SequenceSet::[], which freezes the result. In this case, the benchmark result differences are mostly driven by improved performance of #freeze.

 n= 10 ints (sorted) local: 118753.9 i/s v0.5.12: 85411.4 i/s - 1.39x slower n= 10 string (sorted) v0.5.12: 123087.2 i/s local: 122746.3 i/s - 1.00x slower n= 10 ints (shuffled) local: 105919.2 i/s v0.5.12: 79294.5 i/s - 1.34x slower n= 10 string (shuffled) v0.5.12: 114826.6 i/s local: 108086.2 i/s - 1.06x slower n= 100 ints (sorted) local: 16418.4 i/s v0.5.12: 11864.2 i/s - 1.38x slower n= 100 string (sorted) local: 18161.7 i/s v0.5.12: 15219.3 i/s - 1.19x slower n= 100 ints (shuffled) local: 16640.1 i/s v0.5.12: 11815.8 i/s - 1.41x slower n= 100 string (shuffled) v0.5.12: 14755.8 i/s local: 14512.8 i/s - 1.02x slower n= 1,000 ints (sorted) local: 1722.2 i/s v0.5.12: 1229.0 i/s - 1.40x slower n= 1,000 string (sorted) local: 1862.1 i/s v0.5.12: 1543.2 i/s - 1.21x slower n= 1,000 ints (shuffled) local: 1684.9 i/s v0.5.12: 1252.3 i/s - 1.35x slower n= 1,000 string (shuffled) v0.5.12: 1467.3 i/s local: 1424.6 i/s - 1.03x slower n= 10,000 ints (sorted) local: 158.1 i/s v0.5.12: 127.9 i/s - 1.24x slower n= 10,000 string (sorted) local: 187.7 i/s v0.5.12: 143.4 i/s - 1.31x slower n= 10,000 ints (shuffled) local: 145.8 i/s v0.5.12: 114.5 i/s - 1.27x slower n= 10,000 string (shuffled) v0.5.12: 138.4 i/s local: 136.9 i/s - 1.01x slower n=100,000 ints (sorted) local: 14.9 i/s v0.5.12: 10.6 i/s - 1.40x slower n=100,000 string (sorted) local: 19.2 i/s v0.5.12: 14.0 i/s - 1.37x slower

The new code is ~1-6% slower for shuffled strings, but ~30-40% faster for sorted sets (note that unsorted non-string inputs create a sorted set).

Not duplicating the data in `@tuples` and `@string` saves memory. For large sequence sets, this memory savings can be substantial. But this is a tradeoff: it saves time when the string is not used, but uses more time when the string is used more than once. Working with set operations can create many ephemeral sets, so avoiding unintentional string generation can save a lot of time. Also, by quickly scanning the entries after a string is parsed, we can bypass the merge algorithm for normalized strings. But this does cause a small penalty for non-normalized strings. **Please note:** It _is still possible_ to create a memoized string on a normalized SequenceSet with `#append`. For example: create a monotonically sorted SequenceSet with non-normal final entry, then call `#append` with an adjacently following entry. `#append` coalesces the final entry and converts it into normal form, but doesn't check whether the _preceding entries_ of the SequenceSet are normalized. -------------------------------------------------------------------- Results from benchmarks/sequence_set-normalize.yml There is still room for improvement here, because #normalize generates the normalized string for comparison rather than just reparse the string. ``` normal local: 19938.9 i/s v0.5.12: 2988.7 i/s - 6.67x slower frozen and normal local: 17011413.5 i/s v0.5.12: 3574.4 i/s - 4759.30x slower unsorted local: 19434.9 i/s v0.5.12: 2957.5 i/s - 6.57x slower abnormal local: 19835.9 i/s v0.5.12: 3037.1 i/s - 6.53x slower ``` -------------------------------------------------------------------- Results from benchmarks/sequence_set-new.yml Note that this benchmark doesn't use `SequenceSet::new`; it uses `SequenceSet::[]`, which freezes the result. In this case, the benchmark result differences are mostly driven by improved performance of `#freeze`. ``` n= 10 ints (sorted) local: 118753.9 i/s v0.5.12: 85411.4 i/s - 1.39x slower n= 10 string (sorted) v0.5.12: 123087.2 i/s local: 122746.3 i/s - 1.00x slower n= 10 ints (shuffled) local: 105919.2 i/s v0.5.12: 79294.5 i/s - 1.34x slower n= 10 string (shuffled) v0.5.12: 114826.6 i/s local: 108086.2 i/s - 1.06x slower n= 100 ints (sorted) local: 16418.4 i/s v0.5.12: 11864.2 i/s - 1.38x slower n= 100 string (sorted) local: 18161.7 i/s v0.5.12: 15219.3 i/s - 1.19x slower n= 100 ints (shuffled) local: 16640.1 i/s v0.5.12: 11815.8 i/s - 1.41x slower n= 100 string (shuffled) v0.5.12: 14755.8 i/s local: 14512.8 i/s - 1.02x slower n= 1,000 ints (sorted) local: 1722.2 i/s v0.5.12: 1229.0 i/s - 1.40x slower n= 1,000 string (sorted) local: 1862.1 i/s v0.5.12: 1543.2 i/s - 1.21x slower n= 1,000 ints (shuffled) local: 1684.9 i/s v0.5.12: 1252.3 i/s - 1.35x slower n= 1,000 string (shuffled) v0.5.12: 1467.3 i/s local: 1424.6 i/s - 1.03x slower n= 10,000 ints (sorted) local: 158.1 i/s v0.5.12: 127.9 i/s - 1.24x slower n= 10,000 string (sorted) local: 187.7 i/s v0.5.12: 143.4 i/s - 1.31x slower n= 10,000 ints (shuffled) local: 145.8 i/s v0.5.12: 114.5 i/s - 1.27x slower n= 10,000 string (shuffled) v0.5.12: 138.4 i/s local: 136.9 i/s - 1.01x slower n=100,000 ints (sorted) local: 14.9 i/s v0.5.12: 10.6 i/s - 1.40x slower n=100,000 string (sorted) local: 19.2 i/s v0.5.12: 14.0 i/s - 1.37x slower ``` The new code is ~1-6% slower for shuffled strings, but ~30-40% faster for sorted sets (note that unsorted non-string inputs create a sorted set). 📚 Update SequenceSet#normalize rdoc

nevans changed the title ~~⚡️ Don't store SequenceSet#string when normalized~~⚡️ Don't memoize SequenceSet#string on normalized setsNov 22, 2025

nevans force-pushed the sequence_set/drop-normalized-string branch 4 times, most recently from 20ac793 to 2baf04dCompare November 24, 2025 22:32

nevans force-pushed the sequence_set/drop-normalized-string branch from 2baf04d to 8ed52bfCompare November 25, 2025 14:54

nevans mentioned this pull request Nov 25, 2025
⚡ Faster SequenceSet#normalize when frozen #556
Merged

nevans merged commit 82ccb37 into masterNov 25, 2025
32 checks passed

nevans deleted the sequence_set/drop-normalized-string branch November 25, 2025 18:31

nevans added the performance related to CPU use, memory use, latency, etc label Nov 25, 2025

nevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Don't memoize `SequenceSet#string` on normalized sets#554

⚡️ Don't memoize `SequenceSet#string` on normalized sets #554

Uh oh!

nevans commented Nov 22, 2025•
edited
Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Don't memoize SequenceSet#string on normalized sets#554

⚡️ Don't memoize SequenceSet#string on normalized sets #554

Uh oh!

Conversation

nevans commented Nov 22, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Results from benchmarks/sequence_set-normalize.yml

Results from benchmarks/sequence_set-new.yml

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Don't memoize `SequenceSet#string` on normalized sets#554

⚡️ Don't memoize `SequenceSet#string` on normalized sets #554

nevans commented Nov 22, 2025•
edited
Loading