KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs

<span class="small_caps">KLEE</span>: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · 13 — 13 · 14 · 15 · 16

Symbolic execution dynamic analysis tool, modified EXE (See EXE paper).

1 It found 56 serious bugs, including 3 in Coreutils. Symbolic input initially allowed to be “anything”. 2 main problems of symbolic execution: (1) the exponential number of paths and (2) interacts with its surrounding environment (OS, the network, the user).

2 3 Non-determinism in checked code have produced false positives in practice. KLEE analyses LLVM bytecode. 4 Given more than one active path, KLEE must pick which one to execute first. At each dangerous operation KLEE checks if any possible value allowed by the current path condition would cause an error.

KLEE is a hybrid between OS for SE and interpreter. Each symbolic process (KLEE called it state) has a register file, stack, heap, program counter, and path condition. KLEE interprets LLVM bytecode with bit-level accuracy. Symbolic storage locations refer to expressions (trees!) instead of raw data values. If all given operands are concrete KLEE performs the operation natively. KLEE queries the constraint solver to determine if the branch condition is either provably true or provably false along the current path; if so, the instruction pointer is updated to the appropriate location; if both branches are possible, KLEE clones the state. It runs STP on every branch? Dangerous operations implicitly generate branches. Load and store instructions also generate checks. 5 When a dereferenced pointer p can refer to n objects, KLEE clones the current state n times. There are a lot of recorded states (symbolic processes) during execution: hunderds of thousands in few minutes. In KLEE heap is implemented as immutable map and portions of it is shared between states (EXE used only one process per state).

Query optimizations:

Search strategies:

When program reads from environment, KLEE want to return all values that the read could produce. Calls are redirected to models (need to understand semantics well enough). Files have fixed size. Calling open() with symbolic name matches N symbolic files and one fail answer. 8 Standart library contains a huge number of functions! KLEE also simulates environmental failures (such as no more space on disk). 9 10 KLEE only checks for low-level errors and violations of user-level asserts, while developer tests validate expected behavior. So, you can't say DA is better than manual tests. 11 12 Symbolic execution tools have significant trick. They run path before branch only once. Is it mean SE is better than Avalanche style? 13 KLEE can be used for verify functional equivalence using assert(f(x) == f'(x)). Think about using DA for this for different languages. 13 14 15 16

Sergey Vartanov, 2007–2020