MachineSink's SinkAndFold and the Duplicate Use Bug

This is a personal study note — I write these to solidify my own understanding. If you spot anything wrong or have thoughts, feel free to reach out. Layout, formatting, and grammar assisted by Claude.

While working on a project that generates LLVM IR programmatically, I encountered a segmentation fault deep inside LLVM's AArch64 backend during code generation. The bug is in MachineSink.cpp's PerformSinkAndFold optimization. It crashes when a load or store instruction uses the same virtual register in two operand slots.

The Crash

The following valid LLVM IR crashes llc on AArch64:

define i64 @test(i64 %0) {
  %mul  = mul i64 2114293064, %0
  %mask = and i64 %mul, 4294967295
  %addr = add i64 %mask, %mask        ; same SSA value as both operands
  %ptr  = inttoptr i64 %addr to ptr
  %val  = load i32, ptr %ptr, align 4
  %ext  = zext i32 %val to i64
  ret i64 %ext
}

$ llc -mtriple=aarch64-linux-gnu test.ll -o /dev/null
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/
Stack dump:
2. Running pass 'Machine code sinking' on function '@test'
#4 llvm::MachineInstr::getMF()
#5 llvm::AArch64InstrInfo::emitLdStWithAddr()
#6 MachineSinking::PerformSinkAndFold()

The crash is a null pointer dereference inside getMF(), which calls getParent()->getParent() on a MachineInstr whose parent has been set to null.

Some important observations:

This reproduces with pure llc — no custom passes or calling conventions involved.
The x86-64 backend does not crash on the same IR.
Both -global-isel and SelectionDAG paths crash on AArch64.
The bug exists in LLVM 22.1.0-rc2 and was fixed in LLVM 23.0.0git.

Reducing the Reproducer

Through binary search, I narrowed the minimal trigger. Each of these alone does not crash:

; Just add X, X -> load: works fine
%addr = add i64 %0, %0
%ptr = inttoptr i64 %addr to ptr
%val = load i32, ptr %ptr, align 4

; mul -> add X, X -> load: works fine
%mul = mul i64 2114293064, %0
%addr = add i64 %mul, %mul
%ptr = inttoptr i64 %addr to ptr
%val = load i32, ptr %ptr, align 4

; and -> add X, X -> load: works fine
%mask = and i64 %0, 4294967295
%addr = add i64 %mask, %mask
%ptr = inttoptr i64 %addr to ptr
%val = load i32, ptr %ptr, align 4

But the combination of mul -> and -> add X, X -> load crashes:

%mul  = mul i64 2114293064, %0
%mask = and i64 %mul, 4294967295
%addr = add i64 %mask, %mask
%ptr  = inttoptr i64 %addr to ptr
%val  = load i32, ptr %ptr, align 4

Background: What MachineSink's SinkAndFold Does

MachineSink is a backend pass that moves instructions closer to their uses to reduce register pressure. PerformSinkAndFold goes further: it eliminates instructions entirely by folding their operations into the addressing modes of consuming loads or stores.

For example, AArch64 has an addressing mode LDR W0, [Xbase, Windex, UXTW] where UXTW means "take the 32-bit index register, zero-extend it to 64 bits, then add to base." This performs a truncation inline as part of the address calculation. If MachineSink can fold a MOV_W (32-bit truncation) into this addressing mode, it eliminates the MOV_W entirely:

; Before: 2 instructions
%vreg2 = MOV_W %vreg1              ; truncate to 32 bits
LDR W0, [%vreg2]                   ; load

; After: 1 instruction
LDR W0, [Xbase, W_vreg1, UXTW]    ; truncation folded into addressing mode

What Happens Step by Step

Step 1: Instruction Selection Folds the ADD into the LDR

The AArch64 instruction selector sees add i64 %mask, %mask feeding a load and recognizes that AArch64's register-offset addressing mode LDR Wn, [Xbase, Xindex] can compute base + index as part of the load. It folds the add into the LDR:

%vreg1 = MUL X0, #2114293064
%vreg2 = MOV_W %vreg1                   ; and 0xFFFFFFFF (truncate via 32-bit move)
%vreg3 = LDR W0, [%vreg2, %vreg2]       ; load from vreg2 + vreg2

The add instruction is gone. The LDR now uses %vreg2 in two operand slots — once as base, once as index. This is valid AArch64: LDR W0, [X8, X8] loads from address X8 + X8.

Step 2: MachineSink Considers Folding MOV_W

MachineSink examines %vreg2 = MOV_W %vreg1 and asks: "Can I fold this truncation into every instruction that uses %vreg2?"

It wants to use AArch64's UXTW addressing mode — LDR W0, [Xbase, Windex, UXTW] — which performs the zero-extension inline. If all uses can absorb this, the MOV_W can be deleted.

Step 3: Building the SinkInto List (The Bug)

To find all fold targets, the code iterates over every operand that uses %vreg2:

for (MachineOperand &MO : MRI->use_nodbg_operands(%vreg2)) {
    MachineInstr &UseInst = *MO.getParent();
    // Can we fold into this instruction?
    if (canFoldIntoAddrMode(UseInst, ...))
        SinkInto.emplace_back(&UseInst, AM);
}

The critical detail: use_nodbg_operands iterates over operands, not instructions. The LDR W0, [%vreg2, %vreg2] has two operands that reference %vreg2:

Iterator Step	Operand Visited	`MO.getParent()` Returns	Action
1	Base operand (`%vreg2`)	`&LDR`	`canFoldIntoAddrMode` -> true -> `SinkInto.push_back(&LDR)`
2	Index operand (`%vreg2`)	`&LDR` (same pointer)	`canFoldIntoAddrMode` -> true -> `SinkInto.push_back(&LDR)`

After this loop, SinkInto contains two entries pointing to the same MachineInstr:

SinkInto = [ {&LDR, AM}, {&LDR, AM} ]

The code assumed each use-operand belongs to a different instruction. This assumption holds for nearly all real code — almost no instruction uses the same register in two operand slots. But LDR [%vreg2, %vreg2] violates it.

Step 4: Processing the List (The Crash)

The processing loop creates a replacement instruction and deletes the original for each entry:

for (auto &[SinkDst, MaybeAM] : SinkInto) {
    New = TII->emitLdStWithAddr(*SinkDst, MaybeAM);   // create folded replacement
    SinkDst->eraseFromParent();                         // delete original
}

Iteration 1 — SinkDst = &LDR (alive):

emitLdStWithAddr(LDR, AM) calls LDR.getParent() -> returns the basic block. Creates a new LDR W0, [%vreg1, W_vreg1, UXTW]. Inserts it. Returns successfully.
SinkDst->eraseFromParent() removes the old LDR from its basic block. The LDR's parent pointer becomes null.

Iteration 2 — SinkDst = &LDR (dead, same pointer):

emitLdStWithAddr(LDR, AM) calls LDR.getMF(), which calls LDR.getParent(). Returns null. Dereferences null -> segfault.

// MachineInstr.cpp
const MachineFunction *MachineInstr::getMF() const {
    return getParent()->getParent();   // getParent() returns NULL -> crash
}

The code didn't know that both entries pointed to the same instruction. It processed the first entry, deleted the instruction, then tried to process the second entry on an instruction that no longer exists.

The Upstream Fix (LLVM 22 -> 23)

Between LLVM 22.1.0-rc2 and LLVM 23.0.0git, a fix was added to PerformSinkAndFold. The fix adds four lines to the operand iteration loop, before SinkInto.emplace_back:

LLVM 22 (vulnerable):

} else if (UseInst.mayLoadOrStore()) {
    ExtAddrMode AM;
    if (!TII->canFoldIntoAddrMode(UseInst, Reg, MI, AM))
        return false;
    MaybeAM = AM;
}

LLVM 23 (fixed):

} else if (UseInst.mayLoadOrStore()) {
    // NEW: If the destination instruction contains more than one use of the
    // register, we won't be able to remove the original instruction, so
    // don't sink.
    if (llvm::count_if(UseInst.operands(), [Reg](const MachineOperand &MO) {
          return MO.isReg() && MO.getReg() == Reg;
        }) > 1)
      return false;
    ExtAddrMode AM;
    if (!TII->canFoldIntoAddrMode(UseInst, Reg, MI, AM))
        return false;
    MaybeAM = AM;
}

Before adding the LDR to SinkInto, the fix counts how many operands in the LDR use the register. If the answer is more than 1, the function returns false immediately. SinkInto stays empty. No crash.

The fix is conservative: it bails out of the entire PerformSinkAndFold rather than trying to deduplicate. This is correct because if the same register appears in two operand slots of a load/store, folding the definition into the addressing mode wouldn't eliminate the original instruction anyway — the other operand still needs it.

The Workaround

For projects that need to build against LLVM 22 or earlier, the workaround is to prevent add i64 %X, %X from appearing in the IR. Since X + X is semantically identical to X << 1, replacing:

%addr = add i64 %mask, %mask

with:

%addr = shl i64 %mask, 1

avoids the issue entirely. With shl, the instruction selector produces LSL + LDR with separate registers, and MachineSink's operand iterator never encounters a load with duplicate register uses.

Related Blog