Skip to content

Conversation

@Brijesh-Thakkar
Copy link

Which issue does this PR close?

Rationale for this change

The octet_length scalar function showed significant performance degradation in
Spark workloads when executed via Comet, as reported in the Comet performance EPIC.

The existing implementation relied on the generic Arrow length kernel for array
inputs, which introduces unnecessary overhead in vectorized execution. Since
octet_length semantics require computing the number of bytes in UTF-8 strings,
this can be implemented more efficiently using Arrow’s concrete string array APIs.

Optimizing this function in DataFusion improves performance for downstream projects
such as Comet and Spark without changing behavior or semantics.

What changes are included in this PR?

  • Replaced the use of the generic Arrow length kernel for array inputs in
    octet_length
  • Added a specialized implementation for:
    • StringArray
    • LargeStringArray
    • StringViewArray
  • Computed byte lengths directly using value_length, avoiding unnecessary
    indirection and overhead
  • Left the scalar execution path unchanged

Are these changes tested?

Yes.

  • Existing unit tests for octet_length were executed and pass successfully
  • Core integration tests exercising octet_length also pass
  • No new tests were required, as existing coverage already validates correctness
    across scalar and array inputs, including UTF-8 and null handling

Are there any user-facing changes?

No.

This change is purely a performance optimization and does not affect:

  • SQL syntax
  • Function semantics
  • Return types
  • Error behavior

CopilotAI review requested due to automatic review settings December 31, 2025 13:39
@github-actionsgithub-actionsbot added the functions Changes to functions implementation label Dec 31, 2025
Copy link
Contributor

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the octet_length function for string arrays by replacing the generic Arrow length kernel with specialized implementations that directly compute byte lengths using Arrow's concrete string array APIs.

Key changes:

  • Removed dependency on Arrow's generic length kernel for array inputs
  • Added specialized manual loop implementations for StringArray, LargeStringArray, and StringViewArray
  • Used value_length() method for StringArray/LargeStringArray and value().len() for StringViewArray

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Brijesh-Thakkar
Copy link
Author

@andygrove Please review this PR

@Brijesh-Thakkar
Copy link
Author

@andygrove Hi Andy, I’ve addressed the CI failures (rustfmt + clippy warnings) and pushed the fixes.
This PR optimizes octet_length by avoiding generic kernels for string arrays and using direct length access.
All tests are passing locally. Would appreciate your review when you have time.

@Jefffrey
Copy link
Contributor

The existing implementation relied on the generic Arrow length kernel for array inputs, which introduces unnecessary overhead in vectorized execution. Since octet_length semantics require computing the number of bytes in UTF-8 strings, this can be implemented more efficiently using Arrow’s concrete string array APIs.

I'm curious about this claim; do you have benchmarks to show a speedup here?

@Brijesh-Thakkar
Copy link
Author

I don’t have concrete benchmarks yet — the motivation here was based on reducing obvious overhead (type dispatch + generic kernel) and aligning with patterns used in other string functions in the codebase that operate directly on concrete string arrays.

Happy to add a small micro-benchmark comparing the previous implementation vs this one for StringArray / StringViewArray if that would be useful, or adjust the PR if you’d prefer to keep the generic kernel without measured data.

@Jefffrey
Copy link
Contributor

I don’t have concrete benchmarks yet — the motivation here was based on reducing obvious overhead (type dispatch + generic kernel) and aligning with patterns used in other string functions in the codebase that operate directly on concrete string arrays.

Happy to add a small micro-benchmark comparing the previous implementation vs this one for StringArray / StringViewArray if that would be useful, or adjust the PR if you’d prefer to keep the generic kernel without measured data.

Performance improvement PRs should come with benchmark results (unless the improvement is clearly obvious in code). Would you be able to provide some benchmark results to see if this PR gives us a performance benefit?

@Brijesh-Thakkar
Copy link
Author

Yes, that makes sense — I’ll add a small benchmark comparing the previous implementation and this one for StringArray and StringViewArray, and update the PR with the results.

Thanks for the guidance.

@JefffreyJefffrey marked this pull request as draft January 3, 2026 07:57
@Brijesh-Thakkar
Copy link
Author

@Jefffrey How can I run benchmarks locally??

@Jefffrey
Copy link
Contributor

@Jefffrey How can I run benchmarks locally??

See some examples of microbenchmarks here: #19551

They should be able to be run using cargo bench command (ensuring you select the correct benchmark, etc.)

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functionsChanges to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@Brijesh-Thakkar@Jefffrey