When ELF Section Symbols Have Names: A Subtle Relocation Bug in Multi-Batch LLVM Compilation
Related Blog
This is a personal study note — I write these to solidify my own understanding. If you spot anything wrong or have thoughts, feel free to reach out. Layout, formatting, and grammar assisted by Claude.
Context
I'm working on a project that generates LLVM IR programmatically and compiles it to AArch64 machine code. The input is large — millions of IR functions — so compilation is split into batches. Each batch compiles independently into an ELF object, and the results are concatenated into a single final object file before linking.
This post documents a bug that took a few days to track down: constant pool loads were silently fetching wrong data, but only in binaries large enough to span many batches. The root cause touches ELF symbol types, section-relative relocations, and a subtle difference between what LLVM's SymbolRef::getName() returns for section symbols versus what you might expect.
What I Saw
The crash was a SIGSEGV in a struct array indexing sequence:
add x9, x9, x9, lsl #1
add x9, x19, x9, lsl #2
stur w1, [x9, #4] ; → SIGSEGV
GDB showed x9 = 0x03020100. That's the bytes {0x00, 0x01, 0x02, 0x03} in little-endian — suspiciously structured, not a valid pointer. Tracing backwards through the register chain:
x9got its value from a memory load (ldp)- The loaded memory was written by a vector store (
str q) - The stored vector's lane 1 should have been
0xFFFFFFFFbut was0x03020100 - That lane came from
v1, which was set byfmov s1, w0(where w0 =0xFFFFFFFF) - After the
fmov, aldr q1, [x12, #3216]instruction overwrote v1 with constant pool data
The ldr q1 was a legitimate constant pool load — LLVM generated it to materialize a constant for the tail call at the end of the function. The instruction itself was correct. But it loaded from the wrong address because the relocation addend pointed to offset 0x10 in the global constant pool section (a TBL shuffle mask from a different batch) instead of offset 0xB0 (the intended {0xFFFFFFFF, 0, 0, 0} constant).
Comparing the standalone llc output (correct) against the actual binary (wrong) confirmed the instructions were identical — only the resolved constant pool address differed. This pointed to the relocation pipeline.
Background: How LLVM Materializes Constants
When LLVM lowers IR to AArch64 machine code, it frequently encounters vector or floating-point constants that can't be encoded as immediates. A constant like <i64 4294967295, i64 0> (the 128-bit value {0xFFFFFFFF, 0, 0, 0} when viewed as four 32-bit lanes) has no single AArch64 instruction that can produce it directly. Instead, LLVM places the constant in a read-only data section — typically .rodata.cst16 for 16-byte entries — and emits a load instruction to fetch it at runtime. The naming convention .rodata.cstN means "read-only data, fixed-size constant entries of N bytes" — so .rodata.cst4 holds 32-bit floats, .rodata.cst8 holds 64-bit doubles, and .rodata.cst16 holds 128-bit vector constants [5].
On AArch64, loading from a constant pool requires two instructions working together:
adrp x9, .rodata.cst16 ; compute the 4KB page containing the constant
ldr q1, [x9, :lo12:.LCPI0_0] ; load from the page offset
Each instruction carries a relocation that the linker resolves:
R_AARCH64_ADR_PREL_PG_HI21onadrp: resolves the upper bits (which 4KB page)R_AARCH64_LDST128_ABS_LO12_NConldr: resolves the lower 12 bits (offset within the page)
Both relocations reference the same symbol and addend so they collectively point to the same constant.
Background: ELF Symbols, String Tables, and Section Symbols
To understand the bug, you need to understand how ELF stores symbol names and how section symbols work.
Symbol tables and string tables
An ELF object file contains one or more symbol tables (.symtab, .dynsym). Each entry in the table is a fixed-size struct (Elf64_Sym) with fields including [0] [2]:
st_name: an index into the symbol string table (.strtab)st_info: encodes the symbol's type (STT_NOTYPE,STT_FUNC,STT_OBJECT,STT_SECTION,STT_FILE, etc.) and binding (STB_LOCAL,STB_GLOBAL,STB_WEAK)st_shndx: the index of the section the symbol is defined inst_value: the symbol's value (an address or offset)
The string table is a flat byte array of null-terminated strings. The ELF spec defines that index 0 always points to a null byte, so st_name = 0 means "this symbol has no name" (or equivalently, its name is the empty string) [1].
Section symbols (STT_SECTION)
Section symbols are a special type. The ELF specification describes them as symbols that "exist primarily for relocation and normally have STB_LOCAL binding" [3]. Their purpose is to represent an entire section so that relocations can reference offsets within that section. By convention, section symbols have st_name = 0 — their identity comes from st_shndx, which tells you which section they represent.
If you need the section's name, you look it up through the section header string table (.shstrtab), which is a different string table from .strtab. Section symbols don't need names of their own because their identity comes from st_shndx — storing the section name in .strtab as well would just be duplicating a string that already exists in .shstrtab [0] [1].
So at the raw ELF level, calling getName() on a section symbol should return an empty string — the symbol's st_name is 0, pointing to the null byte at the start of .strtab.
How relocations use section symbols
ELF relocations reference symbols. There are broadly two patterns:
Named symbols: The relocation targets a specific named symbol (like .LCPI0_0) that lives at a fixed offset within its section. The addend is typically 0 or a small adjustment relative to that symbol.
Section symbols: The relocation targets the section itself via an STT_SECTION symbol. The addend is the byte offset within that section where the target data lives.
LLVM's AArch64 backend uses section symbols for constant pool references. A relocation like .rodata.cst16 + 0x20 means "the data at byte offset 0x20 within the .rodata.cst16 section."
The Multi-Batch Problem
When compilation is split into batches, each batch produces its own .rodata.cst16 section with entries starting at offset 0. The final object file concatenates all batches' constant pool data into a single .rodata.cst16 section:
Final .rodata.cst16 layout:
[Batch 12 data: 16 bytes] global offsets 0x00–0x0F
[Batch 19 data: 144 bytes] global offsets 0x10–0x9F
[Batch 20 data: 48 bytes] global offsets 0xA0–0xCF
[Batch 21 data: 16 bytes] global offsets 0xD0–0xDF
A relocation from Batch 20 with a batch-local addend of 0x10 needs to become a global addend of 0xB0 (0xA0 + 0x10), because Batch 20's data starts at offset 0xA0 in the concatenated section.
The code tracks this with an accumulated offset per section name:
// After processing each batch's relocations, update the accumulated offset
uint64_t &accum = AccumulatedConstPoolSize[sectionName];
accum += batchDataSize;
And before appending a batch's relocations to the global list, section-relative addends are shifted:
if (R.IsSectionReloc && R.SymName.starts_with(".rodata.cst")) {
R.Addend += AccumulatedConstPoolSize[R.SymName];
}
This logic is correct. The bug was in how IsSectionReloc was determined.
The Root Cause
In January 2019, an LLVM commit changed ELFObjectFile::getSymbolName() to automatically resolve section symbol names [4]. The motivation was reasonable: llvm-nm was printing empty names for section symbols, which made its output less useful than GNU nm. The fix added a fallback — if the symbol's name from the string table is empty and the symbol type is STT_SECTION, look up the section name from the section header string table and return that instead:
Expected<StringRef> Name = ESym->getName(*SymStrTabOrErr);
// If the symbol name is empty use the section name.
if ((!Name || Name->empty()) && ESym->getType() == ELF::STT_SECTION) {
StringRef SecName;
Expected<section_iterator> Sec = getSymbolSection(Sym);
if (Sec && !(*Sec)->getName(SecName))
return SecName;
}
return Name;
This is a convenience for tools like llvm-nm and llvm-objdump, but it has a side effect: any code that calls SymbolRef::getName() and checks for an empty result to detect section symbols will no longer work. The name is no longer empty — it's ".rodata.cst16" or ".text" or whatever the section is called. The LLVM reviewers noted this risk at the time — reviewer jhenderson specifically warned about code that might "use the name in some kind of map" which "could cause issues e.g. with relocations" — but concluded it was "probably mostly fine" [4].
My extraction code relied on exactly this pattern:
object::symbol_iterator SymIter = Reloc.getSymbol();
if (SymIter != Obj.symbol_end()) {
Expected<StringRef> SymNameOrErr = SymIter->getName();
if (SymNameOrErr)
RE.SymName = SymNameOrErr->str();
// BUG: This never triggers for .rodata.cst16 relocations
// because getName() returns ".rodata.cst16", not ""
if (RE.SymName.empty()) {
RE.IsSectionReloc = true;
// ... get section name ...
}
}
Since getName() returned ".rodata.cst16" (not empty), IsSectionReloc was never set to true. The downstream addend adjustment was skipped entirely. Batch-local addends passed through unchanged to the final object file.
What Went Wrong Concretely
For Batch 12 (the first batch with constant pool data), the accumulated offset was 0. No adjustment was needed, so everything worked by accident.
For Batch 20 (accumulated offset = 160 bytes), a function needed two constants:
| Constant | Batch-local offset | Correct global offset | Actual global offset |
|---|---|---|---|
{0xFFFFFFFF, 0, 0, 0} | 0x10 | 0xB0 | 0x10 (WRONG) |
{0, 0, 0xFFFFFFFF, 0} | 0x20 | 0xC0 | 0x20 (WRONG) |
Global offset 0x10 contained a TBL byte-shuffle mask ({0x03020100, 0x07060504, ...}) from Batch 19 — completely unrelated data. The function loaded this garbage instead of the intended constant, corrupting a vector register. The corruption propagated through several subsequent functions until a downstream instruction used the wrong value as an array index, computed an out-of-range address, and triggered the segmentation fault I started with.
I confirmed this by adding debug logging to the relocation adjustment code, which showed IsSectionReloc was false for every .rodata.cst16 relocation:
RELOC: sym=.rodata.cst16 isSec=0 addend=0 ← should be isSec=1
RELOC: sym=.rodata.cst16 isSec=0 addend=16 ← addend never adjusted
The Fix
Check the ELF symbol type — the authoritative source for symbol classification, defined in the st_info field [0] — instead of relying on name emptiness:
object::symbol_iterator SymIter = Reloc.getSymbol();
if (SymIter != Obj.symbol_end()) {
Expected<StringRef> SymNameOrErr = SymIter->getName();
if (SymNameOrErr)
RE.SymName = SymNameOrErr->str();
// Detect section symbols by ELF type, not by name emptiness.
// LLVM's getName() returns the section name for STT_SECTION symbols,
// so the name is NOT empty — we must check the type directly.
object::ELFSymbolRef ELFSym(*SymIter);
bool isSectionSym = (ELFSym.getELFType() == ELF::STT_SECTION);
if (isSectionSym || RE.SymName.empty()) {
RE.IsSectionReloc = true;
if (RE.SymName.empty()) {
Expected<object::section_iterator> SecOrErr = SymIter->getSection();
if (SecOrErr && *SecOrErr != Obj.section_end()) {
Expected<StringRef> SecNameOrErr = (*SecOrErr)->getName();
if (SecNameOrErr)
RE.SymName = SecNameOrErr->str();
}
}
}
}
After the fix, the debug log showed the adjustment working:
RELOC: sym=.rodata.cst16 isSec=1 addend=0 ← correctly detected
RELOC: sym=.rodata.cst16 isSec=1 addend=16 ← addend will be shifted by accumulated offset
Why This Bug Hid for So Long
Scale threshold. The bug only manifests when multiple batches produce .rodata.cst16 data. In a 25-batch translation of a 2.47-million-address binary, only 4 batches had constant pool sections. Smaller benchmarks either had no SSE shuffle instructions generating constant pool entries, or all their constants fit within a single batch where the accumulated offset is zero.
Silent corruption. Most wrong constants don't cause crashes. A vector register loaded with a TBL mask instead of an all-ones constant will produce incorrect computation results, but unless that result feeds into an address calculation or a branch condition, execution continues with silently wrong data. Only a specific chain — wrong constant → wrong register value → used as array index → out-of-range address → SIGSEGV — made the bug observable.
API behavior changed under our feet. Before LLVM commit D57105 (January 2019), SymbolRef::getName() did return empty strings for section symbols, so checking name.empty() was a valid detection method. Code written (or patterns learned) before that change would have worked correctly. The commit changed the semantics of getName() without updating all downstream consumers — the LLVM reviewers acknowledged this risk but deemed it acceptable for the llvm-nm use case that motivated the change [4].
Correct standalone behavior. Compiling any individual function's IR with llc produced correct code. The bug existed purely in the multi-batch concatenation pipeline, making it invisible to unit tests or single-function verification.
References
[0] The ELF specification defines the symbol table entry structure (Elf64_Sym) and all symbol types including STT_SECTION: ELF spec — Symbol Table
[1] The ELF string table specification explains the index-0-is-null convention — "a string whose index is zero specifies either no name or a null name, depending on the context": ELF spec — String Table
[2] The Linux man page elf(5) provides a practical reference for Elf64_Sym struct layout and all symbol types and bindings: elf(5) — Linux manual page
[3] The Oracle Linker and Libraries Guide describes STT_SECTION as "associated with a section" where "symbol table entries of this type exist primarily for relocation and normally have STB_LOCAL binding": Oracle — Symbol Table Section
[4] LLVM review D57105 (January 2019) changed getSymbolName() to return the section name for STT_SECTION symbols with empty st_name. Reviewer jhenderson warned that "some code may be using the name in some kind of map" which "could cause issues e.g. with relocations": LLVM D57105
[5] The .rodata.cstN naming convention originates from GCC's constant merging support — ".rodata.cstN is for fixed size readonly constants N bytes in size (and aligned to the same size)": LSB mailing list — ELF special sections