Debugging Random Program Behavior
I was looking at a weird coredump the other day. From the core, the program was trying to write to virtual address 0x6 and crashed on memcpy.
There’s a piece of code looks like
if (a == 1) {
do_foo();
} else {
do_bar();
}
And from the coredump, a is indeed 1. However the execution took the else branch and crashed on memcpy in do_bar.
- Disassembled the code in
gdbusingdisassemble /s. It’s so much better than justdisassembleor evendisassemble /m. With link time optimization, more functions are being inlined, which makes reading plain assembly harder.disassemble /swould annotate each block of instructions with reference back to the source file. It helps understand the assembly much easier. - Read the assembly and find where the bogus address came from. It boiled down to a single instruction
lea 0x2(%r14) %r15, where%r15was supposed to be set to%r14+2but it’s set to0x6instead.gdbis able to provide register values for each frame by unwinding the stack. So it looks like some kind of CPU/firmware bug. - It’s running E5-2680 v4, at microcode version
0xb000014according to/proc/cpuinfo. - It’s Broadwell according to https://ark.intel.com/products/91754/Intel-Xeon-Processor-E5-2680-v4-35M-Cache-2-40-GHz-.
- Got BIOS version
Version: F06_3B06fromdmidecode. - Once you have enough keywords, Google is your friend. And… there you go https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=842796! Looks like I should try update the firmware.