Skip to content

Conversation

@nevans
Copy link
Collaborator

@nevansnevans commented Aug 7, 2025

When using SequenceSet to store mailbox state for mailboxes with many millions of messages, the last thing you want to do is accidentally generate a string and output that!

So this establishes a maximum number of entries to output as an inspect string, currently 512. Above that number, we output only a much smaller number of entries from the head and tail, currently 16 from each end.

Net::IMAP::SequenceSet(((1..5000) % 2).to_a).inspect#=> #<Net::IMAP::SequenceSet 2500 entries "1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,...(2468 entries omitted)...,4969,4971,4973,4975,4977,4979,4981,4983,4985,4987,4989,4991,4993,4995,4997,4999">

A note on the implementation

When the sequence set has an internal denormalized string, we use regexps to grab the first and last groups of entries from that.

At first, I used a simple Regexp match anchored to \z to grab the tail entries from the string. But that was shockingly slow even when the string isn't extremely large.

Next I benchmarked replacing Regexp#match with String#rpartition and that performed fine, using the same naive regexps. String#rindex was also fine. They are both slower than Regexp#match anchored to \A, but not too bad and not dramatically slower based on string size. I found myself wishing for a String#rscan or String#rsplit. 😆

Then I benchmarked replacing \d+ with \d{0,10} or [1-9]\d{1,9}, and those both made the regexp fast... but Regexp.linear_time? doesn't like nested quantifiers. Fully expanding both the \d+ and the (entry){100} keeps it fast and satisfies Regexp.linear_time?.

But that regexp is UGLY! 🙁

@nevansnevans changed the title ⚡🔎 Abridged SequenceSet#inspect output⚡🔎 Abridged SequenceSet#inspect outputAug 7, 2025
@nevansnevansforce-pushed the sequence_set/abridged-inspect branch from a9f8422 to ef3a6e7CompareAugust 7, 2025 22:01
When using SequenceSet to store mailbox state for mailboxes with many millions of messages, the last thing you want to do is accidentally generate a string and output that! So this establishes a maximum number of entries to output as an inspect string, currently 512. Above that number, we output only a much smaller number of entries from the head and tail, currently 16 from each end. When the sequence set has an internal denormalized string, we use regexps to grab the first and last groups of entries from that. A note on the implementation: At first, I used a simple Regexp match anchored to `\z` to grab the tail entries from the string. But that was shockingly slow even when the string isn't extremely large. Next I benchmarked replacing `Regexp#match` with `String#rpartition` and that performed fine, using the same naive regexps. `String#rindex` was also fine. They are both slower than `Regexp#match` anchored to `\A`, but not too bad and not dramatically slower based on string size. Then I benchmarked replacing `\d+` with `\d{0,10}` or `[1-9]\d{1,9}`, and those both made the regexp fast... but `Regexp.linear_time?` doesn't like nested quantifiers. Fully expanding both the `\d+` and the `(entry){100}` keeps it fast _and_ satisfies `Regexp.linear_time?`. But that regexp is UGLY! 🙁
@nevansnevansforce-pushed the sequence_set/abridged-inspect branch from ef3a6e7 to 3585984CompareAugust 7, 2025 22:13
@nevansnevans merged commit bac67ca into masterAug 7, 2025
37 checks passed
@nevansnevans deleted the sequence_set/abridged-inspect branch August 7, 2025 22:39
#
# This unrolls all nested quantifiers. It's much harder to read, but it's
# also the fastest out of all the versions I tested.
nz_uint32=/[1-9](?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d)?)?)?)?)?)?)?)?)?/
Copy link
CollaboratorAuthor

@nevansnevansAug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b:

# max digits in nz-number(2**32 - 1).digits.count=>10# to unroll /[1-9]\d{0,9}//[1-9]#{9.times.inject(//){/(?:\d#{_1.source})?/}.source}/ == /[1-9](?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d)?)?)?)?)?)?)?)?)?/=>true

@nevansnevans changed the title ⚡🔎 Abridged SequenceSet#inspect output⚡🔎 Abridge SequenceSet#inspect output for more than 512 entriesAug 28, 2025
@nevansnevans added the enhancement New feature or request label Sep 30, 2025
@nevansnevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancementNew feature or requestsequence-setAny code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@nevans