Debugging Random Program Behavior
I was looking at a weird coredump the other day. From the core, the program was trying to write to virtual address 0x6
and crashed on memcpy
.
There’s a piece of code looks like
if (a == 1) {
do_foo();
} else {
do_bar();
}
And from the coredump, a
is indeed 1
. However the execution took the else
branch and crashed on memcpy
in do_bar
.
- Disassembled the code in
gdb
usingdisassemble /s
. It’s so much better than justdisassemble
or evendisassemble /m
. With link time optimization, more functions are being inlined, which makes reading plain assembly harder.disassemble /s
would annotate each block of instructions with reference back to the source file. It helps understand the assembly much easier. - Read the assembly and find where the bogus address came from. It boiled down to a single instruction
lea 0x2(%r14) %r15
, where%r15
was supposed to be set to%r14+2
but it’s set to0x6
instead.gdb
is able to provide register values for each frame by unwinding the stack. So it looks like some kind of CPU/firmware bug. - It’s running E5-2680 v4, at microcode version
0xb000014
according to/proc/cpuinfo
. - It’s Broadwell according to https://ark.intel.com/products/91754/Intel-Xeon-Processor-E5-2680-v4-35M-Cache-2-40-GHz-.
- Got BIOS version
Version: F06_3B06
fromdmidecode
. - Once you have enough keywords, Google is your friend. And… there you go https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=842796! Looks like I should try update the firmware.