Skip to content

Conversation

@RKSimon
Copy link
Collaborator

@RKSimonRKSimon commented Nov 26, 2025

Recognise 'LSB' style AVGFLOOR patterns.

Alive2: https://alive2.llvm.org/ce/z/nfSSk_

Fixes#53648

Recognise 'LSB' style AVGFLOOR patterns. I've attempted to use the m_Reassociatable* pattern matchers, but encountered an issue in that we can't correctly match m_Value/m_Deferred pairs in the same reassociation as it appears that we have no guarantees on the order of matching. I'll raise a bug for this, and in the meantime we have the pattern in the test_lsb_i32 tests to show the missed matching opportunity. Fixesllvm#53648
@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Recognise 'LSB' style AVGFLOOR patterns.

I've attempted to use the m_Reassociatable* pattern matchers, but encountered an issue in that we can't correctly match m_Value/m_Deferred pairs in the same reassociation as it appears that we have no guarantees on the order of matching. I'll raise a bug for this, and in the meantime we have the pattern in the test_lsb_i32 tests to show the missed matching opportunity.

Fixes #53648


Full diff: https://github.com/llvm/llvm-project/pull/169644.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+16-4)
  • (modified) llvm/test/CodeGen/X86/avgfloors-scalar.ll (+27-47)
  • (modified) llvm/test/CodeGen/X86/avgflooru-scalar.ll (+22-54)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 6b79dbb46cadc..813cbeafeaec9 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp@@ -3154,19 +3154,31 @@ SDValue DAGCombiner::visitADDLike(SDNode *N){} // Attempt to form avgfloor(A, B) from (A & B) + ((A ^ B) >> 1) +// Attempt to form avgfloor(A, B) from ((A >> 1) + (B >> 1)) + (A & B & 1) SDValue DAGCombiner::foldAddToAvg(SDNode *N, const SDLoc &DL){SDValue N0 = N->getOperand(0); EVT VT = N0.getValueType(); SDValue A, B; + // FIXME: m_ReassociatableAdd can't handle m_Value/m_Deferred mixing. if ((!LegalOperations || hasOperation(ISD::AVGFLOORU, VT)) && - sd_match(N, m_Add(m_And(m_Value(A), m_Value(B)),- m_Srl(m_Xor(m_Deferred(A), m_Deferred(B)), m_One())))){+ (sd_match(N,+ m_Add(m_And(m_Value(A), m_Value(B)),+ m_Srl(m_Xor(m_Deferred(A), m_Deferred(B)), m_One()))) ||+ sd_match(N, m_Add(m_Add(m_Srl(m_Value(A), m_One()),+ m_Srl(m_Value(B), m_One())),+ m_ReassociatableAnd(m_Deferred(A), m_Deferred(B),+ m_One()))))){ return DAG.getNode(ISD::AVGFLOORU, DL, VT, A, B)} if ((!LegalOperations || hasOperation(ISD::AVGFLOORS, VT)) && - sd_match(N, m_Add(m_And(m_Value(A), m_Value(B)),- m_Sra(m_Xor(m_Deferred(A), m_Deferred(B)), m_One())))){+ (sd_match(N,+ m_Add(m_And(m_Value(A), m_Value(B)),+ m_Sra(m_Xor(m_Deferred(A), m_Deferred(B)), m_One()))) ||+ sd_match(N, m_Add(m_Add(m_Sra(m_Value(A), m_One()),+ m_Sra(m_Value(B), m_One())),+ m_ReassociatableAnd(m_Deferred(A), m_Deferred(B),+ m_One()))))){ return DAG.getNode(ISD::AVGFLOORS, DL, VT, A, B)} diff --git a/llvm/test/CodeGen/X86/avgfloors-scalar.ll b/llvm/test/CodeGen/X86/avgfloors-scalar.ll index fd303192e6c50..c8bbc875834d1 100644 --- a/llvm/test/CodeGen/X86/avgfloors-scalar.ll+++ b/llvm/test/CodeGen/X86/avgfloors-scalar.ll@@ -38,26 +38,20 @@ define i8 @test_fixed_i8(i8 %a0, i8 %a1) nounwind{define i8 @test_lsb_i8(i8 %a0, i8 %a1) nounwind{; X86-LABEL: test_lsb_i8: ; X86: # %bb.0: -; X86-NEXT: movzbl{{[0-9]+}}(%esp), %ecx-; X86-NEXT: movzbl{{[0-9]+}}(%esp), %eax-; X86-NEXT: movl %eax, %edx-; X86-NEXT: sarb %dl-; X86-NEXT: andb %cl, %al-; X86-NEXT: sarb %cl-; X86-NEXT: addb %dl, %cl-; X86-NEXT: andb $1, %al-; X86-NEXT: addb %cl, %al+; X86-NEXT: movsbl{{[0-9]+}}(%esp), %ecx+; X86-NEXT: movsbl{{[0-9]+}}(%esp), %eax+; X86-NEXT: addl %ecx, %eax+; X86-NEXT: shrl %eax+; X86-NEXT: # kill: def $al killed $al killed $eax ; X86-NEXT: retl ; X64-LABEL: test_lsb_i8: ; X64: # %bb.0: -; X64-NEXT: movl %edi, %eax-; X64-NEXT: sarb %al-; X64-NEXT: andb %sil, %dil-; X64-NEXT: sarb %sil-; X64-NEXT: addb %sil, %al-; X64-NEXT: andb $1, %dil-; X64-NEXT: addb %dil, %al+; X64-NEXT: movsbl %sil, %ecx+; X64-NEXT: movsbl %dil, %eax+; X64-NEXT: addl %ecx, %eax+; X64-NEXT: shrl %eax+; X64-NEXT: # kill: def $al killed $al killed $eax ; X64-NEXT: retq %s0 = ashr i8 %a0, 1 %s1 = ashr i8 %a1, 1 @@ -124,26 +118,17 @@ define i16 @test_lsb_i16(i16 %a0, i16 %a1) nounwind{; X86: # %bb.0: ; X86-NEXT: movswl{{[0-9]+}}(%esp), %ecx ; X86-NEXT: movswl{{[0-9]+}}(%esp), %eax -; X86-NEXT: movl %eax, %edx-; X86-NEXT: sarl %edx-; X86-NEXT: andl %ecx, %eax-; X86-NEXT: sarl %ecx-; X86-NEXT: addl %edx, %ecx-; X86-NEXT: andl $1, %eax ; X86-NEXT: addl %ecx, %eax +; X86-NEXT: shrl %eax ; X86-NEXT: # kill: def $ax killed $ax killed $eax ; X86-NEXT: retl ; X64-LABEL: test_lsb_i16: ; X64: # %bb.0: -; X64-NEXT: movswl %si, %eax-; X64-NEXT: movswl %di, %ecx-; X64-NEXT: sarl %ecx-; X64-NEXT: sarl %eax+; X64-NEXT: movswl %si, %ecx+; X64-NEXT: movswl %di, %eax ; X64-NEXT: addl %ecx, %eax -; X64-NEXT: andl %esi, %edi-; X64-NEXT: andl $1, %edi-; X64-NEXT: addl %edi, %eax+; X64-NEXT: shrl %eax ; X64-NEXT: # kill: def $ax killed $ax killed $eax ; X64-NEXT: retq %s0 = ashr i16 %a0, 1 @@ -316,21 +301,19 @@ define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind{; X86-NEXT: pushl %edi ; X86-NEXT: pushl %esi ; X86-NEXT: movl{{[0-9]+}}(%esp), %esi -; X86-NEXT: movl{{[0-9]+}}(%esp), %ecx-; X86-NEXT: movl{{[0-9]+}}(%esp), %eax ; X86-NEXT: movl{{[0-9]+}}(%esp), %edi -; X86-NEXT: movl %edi, %ebx-; X86-NEXT: sarl %ebx-; X86-NEXT: shldl $31, %eax, %edi+; X86-NEXT: movl{{[0-9]+}}(%esp), %eax+; X86-NEXT: movl{{[0-9]+}}(%esp), %ecx+; X86-NEXT: movl %eax, %ebx+; X86-NEXT: xorl %esi, %ebx ; X86-NEXT: movl %ecx, %edx +; X86-NEXT: xorl %edi, %edx+; X86-NEXT: shrdl $1, %edx, %ebx+; X86-NEXT: andl %edi, %ecx ; X86-NEXT: sarl %edx -; X86-NEXT: shldl $31, %esi, %ecx-; X86-NEXT: addl %edi, %ecx-; X86-NEXT: adcl %ebx, %edx ; X86-NEXT: andl %esi, %eax -; X86-NEXT: andl $1, %eax-; X86-NEXT: addl %ecx, %eax-; X86-NEXT: adcl $0, %edx+; X86-NEXT: addl %ebx, %eax+; X86-NEXT: adcl %ecx, %edx ; X86-NEXT: popl %esi ; X86-NEXT: popl %edi ; X86-NEXT: popl %ebx @@ -338,14 +321,11 @@ define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind{; X64-LABEL: test_lsb_i64: ; X64: # %bb.0: -; X64-NEXT: movq %rdi, %rcx-; X64-NEXT: sarq %rcx-; X64-NEXT: andl %esi, %edi ; X64-NEXT: movq %rsi, %rax -; X64-NEXT: sarq %rax-; X64-NEXT: addq %rcx, %rax-; X64-NEXT: andl $1, %edi-; X64-NEXT: addq %rdi, %rax+; X64-NEXT: andq %rdi, %rax+; X64-NEXT: xorq %rdi, %rsi+; X64-NEXT: sarq %rsi+; X64-NEXT: addq %rsi, %rax ; X64-NEXT: retq %s0 = ashr i64 %a0, 1 %s1 = ashr i64 %a1, 1 diff --git a/llvm/test/CodeGen/X86/avgflooru-scalar.ll b/llvm/test/CodeGen/X86/avgflooru-scalar.ll index 9ae4492bb4cd4..7ad10164ad484 100644 --- a/llvm/test/CodeGen/X86/avgflooru-scalar.ll+++ b/llvm/test/CodeGen/X86/avgflooru-scalar.ll@@ -40,24 +40,18 @@ define i8 @test_lsb_i8(i8 %a0, i8 %a1) nounwind{; X86: # %bb.0: ; X86-NEXT: movzbl{{[0-9]+}}(%esp), %ecx ; X86-NEXT: movzbl{{[0-9]+}}(%esp), %eax -; X86-NEXT: movl %eax, %edx-; X86-NEXT: shrb %dl-; X86-NEXT: andb %cl, %al-; X86-NEXT: shrb %cl-; X86-NEXT: addb %dl, %cl-; X86-NEXT: andb $1, %al-; X86-NEXT: addb %cl, %al+; X86-NEXT: addl %ecx, %eax+; X86-NEXT: shrl %eax+; X86-NEXT: # kill: def $al killed $al killed $eax ; X86-NEXT: retl ; X64-LABEL: test_lsb_i8: ; X64: # %bb.0: -; X64-NEXT: movl %edi, %eax-; X64-NEXT: shrb %al-; X64-NEXT: andb %sil, %dil-; X64-NEXT: shrb %sil-; X64-NEXT: addb %sil, %al-; X64-NEXT: andb $1, %dil-; X64-NEXT: addb %dil, %al+; X64-NEXT: movzbl %sil, %ecx+; X64-NEXT: movzbl %dil, %eax+; X64-NEXT: addl %ecx, %eax+; X64-NEXT: shrl %eax+; X64-NEXT: # kill: def $al killed $al killed $eax ; X64-NEXT: retq %s0 = lshr i8 %a0, 1 %s1 = lshr i8 %a1, 1 @@ -124,26 +118,17 @@ define i16 @test_lsb_i16(i16 %a0, i16 %a1) nounwind{; X86: # %bb.0: ; X86-NEXT: movzwl{{[0-9]+}}(%esp), %ecx ; X86-NEXT: movzwl{{[0-9]+}}(%esp), %eax -; X86-NEXT: movl %eax, %edx-; X86-NEXT: shrl %edx-; X86-NEXT: andl %ecx, %eax-; X86-NEXT: shrl %ecx-; X86-NEXT: addl %edx, %ecx-; X86-NEXT: andl $1, %eax ; X86-NEXT: addl %ecx, %eax +; X86-NEXT: shrl %eax ; X86-NEXT: # kill: def $ax killed $ax killed $eax ; X86-NEXT: retl ; X64-LABEL: test_lsb_i16: ; X64: # %bb.0: -; X64-NEXT: movzwl %si, %eax-; X64-NEXT: movzwl %di, %ecx-; X64-NEXT: shrl %ecx-; X64-NEXT: shrl %eax+; X64-NEXT: movzwl %si, %ecx+; X64-NEXT: movzwl %di, %eax ; X64-NEXT: addl %ecx, %eax -; X64-NEXT: andl %esi, %edi-; X64-NEXT: andl $1, %edi-; X64-NEXT: addl %edi, %eax+; X64-NEXT: shrl %eax ; X64-NEXT: # kill: def $ax killed $ax killed $eax ; X64-NEXT: retq %s0 = lshr i16 %a0, 1 @@ -300,40 +285,23 @@ define i64 @test_fixed_i64(i64 %a0, i64 %a1) nounwind{define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind{; X86-LABEL: test_lsb_i64: ; X86: # %bb.0: -; X86-NEXT: pushl %ebx-; X86-NEXT: pushl %edi-; X86-NEXT: pushl %esi-; X86-NEXT: movl{{[0-9]+}}(%esp), %esi ; X86-NEXT: movl{{[0-9]+}}(%esp), %ecx ; X86-NEXT: movl{{[0-9]+}}(%esp), %eax -; X86-NEXT: movl{{[0-9]+}}(%esp), %edi-; X86-NEXT: movl %edi, %ebx-; X86-NEXT: shrl %ebx-; X86-NEXT: shldl $31, %eax, %edi-; X86-NEXT: movl %ecx, %edx-; X86-NEXT: shrl %edx-; X86-NEXT: shldl $31, %esi, %ecx-; X86-NEXT: addl %edi, %ecx-; X86-NEXT: adcl %ebx, %edx-; X86-NEXT: andl %esi, %eax-; X86-NEXT: andl $1, %eax-; X86-NEXT: addl %ecx, %eax-; X86-NEXT: adcl $0, %edx-; X86-NEXT: popl %esi-; X86-NEXT: popl %edi-; X86-NEXT: popl %ebx+; X86-NEXT: addl{{[0-9]+}}(%esp), %ecx+; X86-NEXT: adcl{{[0-9]+}}(%esp), %eax+; X86-NEXT: setb %dl+; X86-NEXT: movzbl %dl, %edx+; X86-NEXT: shldl $31, %eax, %edx+; X86-NEXT: shldl $31, %ecx, %eax ; X86-NEXT: retl ; X64-LABEL: test_lsb_i64: ; X64: # %bb.0: -; X64-NEXT: movq %rdi, %rcx-; X64-NEXT: shrq %rcx-; X64-NEXT: andl %esi, %edi ; X64-NEXT: movq %rsi, %rax -; X64-NEXT: shrq %rax-; X64-NEXT: addq %rcx, %rax-; X64-NEXT: andl $1, %edi-; X64-NEXT: addq %rdi, %rax+; X64-NEXT: andq %rdi, %rax+; X64-NEXT: xorq %rdi, %rsi+; X64-NEXT: shrq %rsi+; X64-NEXT: addq %rsi, %rax ; X64-NEXT: retq %s0 = lshr i64 %a0, 1 %s1 = lshr i64 %a1, 1 

@RKSimon
Copy link
CollaboratorAuthor

ping?

EVT VT = N0.getValueType();
SDValue A, B;

// FIXME: m_ReassociatableAdd can't handle m_Value/m_Deferred mixing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general comment / thinking out loud on this issue: I guess the reason it doesn't handle m_Value / m_Deferred in the same reassociatable expression right now is because reassociatable matching is done in two phases, in which individual m_Value & m_Deferred are completely detached
That being said, even we somehow make them not detached so that all m_Value & m_Deferred will be triggered in the same pattern matching expression / chain, how should we make sense of the permutation case where m_Deferred might appear before its m_Value (I guess this is related to your comment of "no gurantee on the order of matching")? More specifically, if m_Deferred appears before m_Value, the preceding m_Deferred might accidentally grab the bound value populated by the previous permutation.
I think we might be able to solve this problem by imposing some partial order during permutation for reassocitable expressions that contain m_Value & m_Deferred

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - hence I punted this to #169645 - I can find a workaround if necessary as a followup, but I'd much prefer to handle this completely with m_Reassociatable matchers if possible.

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like @bermondd is working on this at #170061 (new contributors not being able to assign reviewers is REALLY annoying).

Copy link
Member

@mshockwavemshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RKSimonRKSimon enabled auto-merge (squash) December 10, 2025 08:12
@RKSimonRKSimon merged commit 804e768 into llvm:mainDec 10, 2025
9 of 10 checks passed
@RKSimonRKSimon deleted the dag-avgfloor-reassoc branch December 10, 2025 13:51
@nikic
Copy link
Contributor

nikic commented Dec 10, 2025

Surprisingly, this change seems to have measurable compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=a108881b24ecfea8d194b33dd9fb211943065bca&to=804e768bda1697ce0449eec844bd9ac4f29aed64&stat=instructions:u

I suspect that ReassociatableOpc_match is very expensive. Might it make sense to make this less generic and just support 3-operand reassociation? That's all you need here, and in that case you only need to match two variants without any complex processing.

@RKSimon
Copy link
CollaboratorAuthor

CC @bermondd@mshockwave - thoughts?

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Dec 10, 2025
The use of nested m_Reassociatable matchers by llvm#169644 can result in high compile times as the inner m_Reassociatable call is being repeated a lot while the outer call is trying to match
@bermondd
Copy link
Contributor

bermondd commented Dec 10, 2025

I'll start working to make this more efficient tonight. These results look pretty bad.

My first guess is that the template recursion that was done may bear some of the fault for this compile time impact here. The number of functions instantiated grows linearly with the number of patterns provided, so maybe a different approach where this isn't necessary could help.

That being said, I personally think reducing the scope of ReassociatableOpc_match to only 3 patterns wouldn't be the right approach. The struct tried to match any number of patterns since prior to #170061, so reducing the domain could break any usages that involved 4 or more, if there are any.

@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Dec 11, 2025

I've seen errors like

LLVM ERROR: SmallVector unable to grow. Requested capacity (4294967296) is larger than maximum value for size type (4294967295) 

when compiling for my out of tree target with this patch. When it fails (after a long time) there are a lot of

#117 0x0000560facf1edc5 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 3u>&) DAGCombiner.cpp:0:0 #118 0x0000560facf1edc5 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 3u>&) DAGCombiner.cpp:0:0 

calls on the stack.
Unfortunately I don't have a reproducer to share (at the moment at least).

@RKSimon
Copy link
CollaboratorAuthor

That can happen in cases where SmallVector<> screws up the CalculateSmallVectorDefaultInlinedElements calculation - please can you try with this local change:

diff --git a/llvm/include/llvm/CodeGen/SDPatternMatch.h b/llvm/include/llvm/CodeGen/SDPatternMatch.h index dda3b3827c7a..4fe4fdcc023e 100644 --- a/llvm/include/llvm/CodeGen/SDPatternMatch.h +++ b/llvm/include/llvm/CodeGen/SDPatternMatch.h @@ -1310,7 +1310,7 @@ template <typename... PatternTs> struct ReassociatableOpc_match{bool match(const MatchContext &Ctx, SDValue N){constexpr size_t NumPatterns = std::tuple_size_v<std::tuple<PatternTs...>> - SmallVector<SDValue> Leaves; + SmallVector<SDValue, 4> Leaves; collectLeaves(N, Leaves); if (Leaves.size() != NumPatterns) return false; @@ -1323,7 +1323,7 @@ template <typename... PatternTs> struct ReassociatableOpc_match{Patterns)} - void collectLeaves(SDValue V, SmallVector<SDValue> &Leaves){+ void collectLeaves(SDValue V, SmallVectorImpl<SDValue> &Leaves){if (V->getOpcode() == Opcode){for (size_t I = 0, N = V->getNumOperands(); I < N; I++) collectLeaves(V->getOperand(I), Leaves); 

(note @bermondd is working on removing the SmallVector entirely)

@mikaelholmen
Copy link
Collaborator

That can happen in cases where SmallVector<> screws up the CalculateSmallVectorDefaultInlinedElements calculation - please can you try with this local change:

Thanks, unfortunately it didn't help.

(note @bermondd is working on removing the SmallVector entirely)

Sounds good. Maybe I'll just wait and see how it behaves after that improvement then.
(We reverted the patch downstream so we're not suffering right now, I just hope there won't be other conflicting changes to that code before the problem is fixed.)

@bermondd
Copy link
Contributor

(note @bermondd is working on removing the SmallVector entirely)

I'll open a PR for this ASAP. I have further changes in mind to improve compile time, but maybe it's better to separate this into two PRs, so at least this reported issue is fixed sooner.

@mgorny
Copy link
Member

This change is also causing crashes on 32-bit x86:

FAIL: LLVM :: CodeGen/X86/2009-03-23-MultiUseSched.ll (26976 of 62615) ******************** TEST 'LLVM :: CodeGen/X86/2009-03-23-MultiUseSched.ll' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 3 /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/llc < /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll -mtriple=x86_64-linux -mcpu=corei7 -relocation-model=static | /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/FileCheck /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll # executed command: /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/llc -mtriple=x86_64-linux -mcpu=corei7 -relocation-model=static # .---command stderr------------ # | terminate called after throwing an instance of 'std::bad_alloc' # | what(): std::bad_alloc # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0. Program arguments: /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/llc -mtriple=x86_64-linux -mcpu=corei7 -relocation-model=static # | 1. Running pass 'Function Pass Manager' on module '<stdin>'. # | 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@foo' # | #0 0xfffffffff22c2780 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0xe4d780) # | #1 0xfffffffff22c2cff PrintStackTraceSignalHandler(void*) Signals.cpp:0:0 # | #2 0xfffffffff22bf358 llvm::sys::RunSignalHandlers() (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0xe4a358) # | #3 0xfffffffff22bf4e6 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 # | #4 0xfffffffff7f4d5a0 (linux-gate.so.1+0x5a0) # | #5 0xfffffffff7f4d579 (linux-gate.so.1+0x579) # | #6 0xfffffffff1014b97 (/usr/lib/libc.so.6+0x94b97) # | #7 0xfffffffff0fb9081 raise (/usr/lib/libc.so.6+0x39081) # | #8 0xfffffffff0f9fdd9 abort (/usr/lib/libc.so.6+0x1fdd9) # | #9 0xfffffffff1264fe2 (/usr/lib/gcc/x86_64-pc-linux-gnu/15/32/libstdc++.so.6+0x7dfe2) # | #10 0xfffffffff12801ad (/usr/lib/gcc/x86_64-pc-linux-gnu/15/32/libstdc++.so.6+0x991ad) # | #11 0xfffffffff12648e2 std::unexpected() (/usr/lib/gcc/x86_64-pc-linux-gnu/15/32/libstdc++.so.6+0x7d8e2) # | #12 0xfffffffff12804ee (/usr/lib/gcc/x86_64-pc-linux-gnu/15/32/libstdc++.so.6+0x994ee) # | #13 0xfffffffff195ff0d LLVMInstallFatalErrorHandler.cold ErrorHandling.cpp:0:0 # | #14 0xfffffffff221d212 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned int, unsigned int) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0xda8212) # | #15 0xfffffffff2d9d107 llvm::SmallVectorTemplateBase<llvm::SDValue, true>::push_back(llvm::SDValue) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1928107) # | #16 0xfffffffff2dcda82 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 6u>&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1958a82) # | #17 0xfffffffff2dcd833 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 6u>&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1958833) # | #18 0xfffffffff2dcd833 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 6u>&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1958833) # | #19 0xfffffffff2dcd833 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 6u>&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1958833) # | #20 0xfffffffff2e58f9e bool llvm::SDPatternMatch::sd_context_match<llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMat ch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_ match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match< llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>&, llvm::SDPatt ernMatch::BasicMatchContext>(llvm::SDNode*, llvm::SDPatternMatch::BasicMatchContext const&, llvm::SDPatternMatch::ReassociatableOpc_mat ch<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPat ternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::Rea ssociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x19e3f9e) # | #21 0xfffffffff2e59ca6 (anonymous namespace)::DAGCombiner::visitADD(llvm::SDNode*) DAGCombiner.cpp:0:0 # | #22 0xfffffffff2e5b2d1 .L54765 DAGCombiner.cpp:0:0 # | #23 0xfffffffff2e5d4a3 (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) DAGCombiner.cpp:0:0 # | #24 0xfffffffff2e5f410 llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::BatchAAResults*, llvm::CodeGenOptLevel) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x19ea410) # | #25 0xfffffffff30d2b5e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1c5db5e) # | #26 0xfffffffff30d739e llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1c6239e) # | #27 0xfffffffff30d9157 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1c64157) # | #28 0xfffffffff560b826 (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) X86ISelDAGToDAG.cpp:0:0 # | #29 0xfffffffff30c02ab llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x1c4b2ab) # | #30 0xfffffffff29124cb llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x149d4cb) # | #31 0xfffffffff24c4895 llvm::FPPassManager::runOnFunction(llvm::Function&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x104f895) # | #32 0xfffffffff24c4a86 llvm::FPPassManager::runOnModule(llvm::Module&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x104fa86) # | #33 0xfffffffff24c53ae llvm::legacy::PassManagerImpl::run(llvm::Module&) (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/../lib/libLLVM.so.22.0git804e768b+0x10503ae) # | #34 0x565c27a4 main (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/llc+0x157a4) # | #35 0xfffffffff0fa2043 (/usr/lib/libc.so.6+0x22043) # | #36 0xfffffffff0fa2108 __libc_start_main (/usr/lib/libc.so.6+0x22108) # | #37 0x565c33b7 _start (/var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/llc+0x163b7) # `----------------------------- # error: command failed with exit status: -6 # executed command: /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/FileCheck /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll # .---command stderr------------ # | FileCheck error: '<stdin>' is empty. # | FileCheck command line: /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm_build-abi_x86_32.x86/bin/FileCheck /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll # `----------------------------- # error: command failed with exit status: 2 -- ******************** 

RKSimon pushed a commit that referenced this pull request Dec 14, 2025
…nown at compile time (#172064) Following the suggestions in #170061, I replaced `SmallVector<SDValue>` with `std::array<SDValue, NumPatterns>` and `SmallBitVector` with `Bitset<NumPatterns>`. I had to make some changes to the `collectLeaves` and `reassociatableMatchHelper` functions. In `collectLeaves` specifically, I changed the return type so I could propagate a failure in case the number of found leaves is greater than the number of expected patterns. I also added a new unit test that, together with the one already present in the previous line, checks if the matching fails in the cases where the number of patterns is less or more than the number of leaves. I don't think this is going to completely address the increased compile time reported in #169644, but hopefully it leads to an improvement.
anonymouspc pushed a commit to anonymouspc/llvm that referenced this pull request Dec 15, 2025
…nown at compile time (llvm#172064) Following the suggestions in llvm#170061, I replaced `SmallVector<SDValue>` with `std::array<SDValue, NumPatterns>` and `SmallBitVector` with `Bitset<NumPatterns>`. I had to make some changes to the `collectLeaves` and `reassociatableMatchHelper` functions. In `collectLeaves` specifically, I changed the return type so I could propagate a failure in case the number of found leaves is greater than the number of expected patterns. I also added a new unit test that, together with the one already present in the previous line, checks if the matching fails in the cases where the number of patterns is less or more than the number of leaves. I don't think this is going to completely address the increased compile time reported in llvm#169644, but hopefully it leads to an improvement.
@mikaelholmen
Copy link
Collaborator

I've verified that the problems I saw goes away with #172064
Thanks!

RKSimon added a commit that referenced this pull request Dec 15, 2025
The use of nested m_Reassociatable matchers by #169644 can result in high compile times as the inner m_Reassociatable call is being repeated a lot while the outer call is trying to match. Place the inner m_ReassociatableAnd at the beginning of the pattern so it is not repeatedly matched in recursion.
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86llvm:SelectionDAGSelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suboptimal code generation for unsigned non-overflowing average

7 participants

@RKSimon@llvmbot@nikic@bermondd@mikaelholmen@mgorny@mshockwave