Skip to content

Conversation

@nevans
Copy link
Collaborator

@nevansnevans commented Nov 22, 2025

Not duplicating the data in @tuples and @string saves memory. For large sequence sets, this memory savings can be substantial.

But this is a tradeoff: it saves time when the string is not used, but uses more time when the string is used more than once. Working with set operations can create many ephemeral sets, so avoiding unintentional string generation can save a lot of time.

Also, by quickly scanning the entries after a string is parsed, we can bypass the merge algorithm for normalized strings. But this does cause a small penalty for non-normalized strings.

Please note: It is still possible to create a memoized string on a normalized SequenceSet with #append. For example: create a monotonically sorted SequenceSet with non-normal final entry, then call #append with an adjacently following entry. #append coalesces the final entry and converts it into normal form, but doesn't check whether the preceding entries of the SequenceSet are normalized.

Benchmarks

Results from benchmarks/sequence_set-normalize.yml

There is still room for improvement here, because #normalize generates the normalized string for comparison rather than just reparse the string.

 normal local: 19938.9 i/s v0.5.12: 2988.7 i/s - 6.67x slower frozen and normal local: 17011413.5 i/s v0.5.12: 3574.4 i/s - 4759.30x slower unsorted local: 19434.9 i/s v0.5.12: 2957.5 i/s - 6.57x slower abnormal local: 19835.9 i/s v0.5.12: 3037.1 i/s - 6.53x slower 

Results from benchmarks/sequence_set-new.yml

Note that this benchmark doesn't use SequenceSet::new; it uses SequenceSet::[], which freezes the result. In this case, the benchmark result differences are mostly driven by improved performance of #freeze.

 n= 10 ints (sorted) local: 118753.9 i/s v0.5.12: 85411.4 i/s - 1.39x slower n= 10 string (sorted) v0.5.12: 123087.2 i/s local: 122746.3 i/s - 1.00x slower n= 10 ints (shuffled) local: 105919.2 i/s v0.5.12: 79294.5 i/s - 1.34x slower n= 10 string (shuffled) v0.5.12: 114826.6 i/s local: 108086.2 i/s - 1.06x slower n= 100 ints (sorted) local: 16418.4 i/s v0.5.12: 11864.2 i/s - 1.38x slower n= 100 string (sorted) local: 18161.7 i/s v0.5.12: 15219.3 i/s - 1.19x slower n= 100 ints (shuffled) local: 16640.1 i/s v0.5.12: 11815.8 i/s - 1.41x slower n= 100 string (shuffled) v0.5.12: 14755.8 i/s local: 14512.8 i/s - 1.02x slower n= 1,000 ints (sorted) local: 1722.2 i/s v0.5.12: 1229.0 i/s - 1.40x slower n= 1,000 string (sorted) local: 1862.1 i/s v0.5.12: 1543.2 i/s - 1.21x slower n= 1,000 ints (shuffled) local: 1684.9 i/s v0.5.12: 1252.3 i/s - 1.35x slower n= 1,000 string (shuffled) v0.5.12: 1467.3 i/s local: 1424.6 i/s - 1.03x slower n= 10,000 ints (sorted) local: 158.1 i/s v0.5.12: 127.9 i/s - 1.24x slower n= 10,000 string (sorted) local: 187.7 i/s v0.5.12: 143.4 i/s - 1.31x slower n= 10,000 ints (shuffled) local: 145.8 i/s v0.5.12: 114.5 i/s - 1.27x slower n= 10,000 string (shuffled) v0.5.12: 138.4 i/s local: 136.9 i/s - 1.01x slower n=100,000 ints (sorted) local: 14.9 i/s v0.5.12: 10.6 i/s - 1.40x slower n=100,000 string (sorted) local: 19.2 i/s v0.5.12: 14.0 i/s - 1.37x slower 

The new code is ~1-6% slower for shuffled strings, but ~30-40% faster for sorted sets (note that unsorted non-string inputs create a sorted set).

@nevansnevans changed the title ⚡️ Don't store SequenceSet#string when normalized⚡️ Don't memoize SequenceSet#string on normalized setsNov 22, 2025
@nevansnevansforce-pushed the sequence_set/drop-normalized-string branch 4 times, most recently from 20ac793 to 2baf04dCompareNovember 24, 2025 22:32
Not duplicating the data in `@tuples` and `@string` saves memory. For large sequence sets, this memory savings can be substantial. But this is a tradeoff: it saves time when the string is not used, but uses more time when the string is used more than once. Working with set operations can create many ephemeral sets, so avoiding unintentional string generation can save a lot of time. Also, by quickly scanning the entries after a string is parsed, we can bypass the merge algorithm for normalized strings. But this does cause a small penalty for non-normalized strings. **Please note:** It _is still possible_ to create a memoized string on a normalized SequenceSet with `#append`. For example: create a monotonically sorted SequenceSet with non-normal final entry, then call `#append` with an adjacently following entry. `#append` coalesces the final entry and converts it into normal form, but doesn't check whether the _preceding entries_ of the SequenceSet are normalized. -------------------------------------------------------------------- Results from benchmarks/sequence_set-normalize.yml There is still room for improvement here, because #normalize generates the normalized string for comparison rather than just reparse the string. ``` normal local: 19938.9 i/s v0.5.12: 2988.7 i/s - 6.67x slower frozen and normal local: 17011413.5 i/s v0.5.12: 3574.4 i/s - 4759.30x slower unsorted local: 19434.9 i/s v0.5.12: 2957.5 i/s - 6.57x slower abnormal local: 19835.9 i/s v0.5.12: 3037.1 i/s - 6.53x slower ``` -------------------------------------------------------------------- Results from benchmarks/sequence_set-new.yml Note that this benchmark doesn't use `SequenceSet::new`; it uses `SequenceSet::[]`, which freezes the result. In this case, the benchmark result differences are mostly driven by improved performance of `#freeze`. ``` n= 10 ints (sorted) local: 118753.9 i/s v0.5.12: 85411.4 i/s - 1.39x slower n= 10 string (sorted) v0.5.12: 123087.2 i/s local: 122746.3 i/s - 1.00x slower n= 10 ints (shuffled) local: 105919.2 i/s v0.5.12: 79294.5 i/s - 1.34x slower n= 10 string (shuffled) v0.5.12: 114826.6 i/s local: 108086.2 i/s - 1.06x slower n= 100 ints (sorted) local: 16418.4 i/s v0.5.12: 11864.2 i/s - 1.38x slower n= 100 string (sorted) local: 18161.7 i/s v0.5.12: 15219.3 i/s - 1.19x slower n= 100 ints (shuffled) local: 16640.1 i/s v0.5.12: 11815.8 i/s - 1.41x slower n= 100 string (shuffled) v0.5.12: 14755.8 i/s local: 14512.8 i/s - 1.02x slower n= 1,000 ints (sorted) local: 1722.2 i/s v0.5.12: 1229.0 i/s - 1.40x slower n= 1,000 string (sorted) local: 1862.1 i/s v0.5.12: 1543.2 i/s - 1.21x slower n= 1,000 ints (shuffled) local: 1684.9 i/s v0.5.12: 1252.3 i/s - 1.35x slower n= 1,000 string (shuffled) v0.5.12: 1467.3 i/s local: 1424.6 i/s - 1.03x slower n= 10,000 ints (sorted) local: 158.1 i/s v0.5.12: 127.9 i/s - 1.24x slower n= 10,000 string (sorted) local: 187.7 i/s v0.5.12: 143.4 i/s - 1.31x slower n= 10,000 ints (shuffled) local: 145.8 i/s v0.5.12: 114.5 i/s - 1.27x slower n= 10,000 string (shuffled) v0.5.12: 138.4 i/s local: 136.9 i/s - 1.01x slower n=100,000 ints (sorted) local: 14.9 i/s v0.5.12: 10.6 i/s - 1.40x slower n=100,000 string (sorted) local: 19.2 i/s v0.5.12: 14.0 i/s - 1.37x slower ``` The new code is ~1-6% slower for shuffled strings, but ~30-40% faster for sorted sets (note that unsorted non-string inputs create a sorted set). 📚 Update SequenceSet#normalize rdoc
@nevansnevansforce-pushed the sequence_set/drop-normalized-string branch from 2baf04d to 8ed52bfCompareNovember 25, 2025 14:54
@nevansnevans merged commit 82ccb37 into masterNov 25, 2025
32 checks passed
@nevansnevans deleted the sequence_set/drop-normalized-string branch November 25, 2025 18:31
@nevansnevans added the performance related to CPU use, memory use, latency, etc label Nov 25, 2025
@nevansnevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performancerelated to CPU use, memory use, latency, etcsequence-setAny code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@nevans