KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
Symbolic execution dynamic analysis tool, modified EXE (See EXE paper).
1 It found 56 serious bugs, including 3 in Coreutils. Symbolic input initially allowed to be “anything”. 2 main problems of symbolic execution: (1) the exponential number of paths and (2) interacts with its surrounding environment (OS, the network, the user).
2 3 Non-determinism in checked code have produced false positives in practice. KLEE analyses LLVM bytecode. 4 Given more than one active path, KLEE must pick which one to execute first. At each dangerous operation KLEE checks if any possible value allowed by the current path condition would cause an error.
KLEE is a hybrid between OS for SE and interpreter. Each symbolic process (KLEE called it state) has a register file, stack, heap, program counter, and path condition. KLEE interprets LLVM bytecode with bit-level accuracy. Symbolic storage locations refer to expressions (trees!) instead of raw data values. If all given operands are concrete KLEE performs the operation natively. KLEE queries the constraint solver to determine if the branch condition is either provably true or provably false along the current path; if so, the instruction pointer is updated to the appropriate location; if both branches are possible, KLEE clones the state. It runs STP on every branch? Dangerous operations implicitly generate branches. Load and store instructions also generate checks. 5 When a dereferenced pointer p can refer to n objects, KLEE clones the current state n times. There are a lot of recorded states (symbolic processes) during execution: hunderds of thousands in few minutes. In KLEE heap is implemented as immutable map and portions of it is shared between states (EXE used only one process per state).
Query optimizations:
- Expression Rewriting. (2x − x) to x.
- Constraint Set Simplification. x < 10 to true if there was x = 5.
- Implied Value Concretization. Replace x with 9 if there is x + 1 = 10.
- Constraint Independence. i < j, j < 20, k > 0 to i < j, j < 20 and k > 0.
- Counter-example Cache. They said it is possible to build a more sophisticated cache! 6
- When a subset of a constraint set has no solution, then neither does the original constraint set.
- When a superset of a constraint set has a solution, that solution also satisfies the original constraint set.
- When a subset of a constraint set has a solution, it is likely that this is also a solution for the original set. Really? Do you have a proof?
Search strategies:
- Random Path Selection.
- 7 Coverage Optimized Search.
When program reads from environment, KLEE want to return all values that the read could produce. Calls are redirected to models (need to understand semantics well enough). Files have fixed size. Calling open() with symbolic name matches N symbolic files and one fail answer. 8 Standart library contains a huge number of functions! KLEE also simulates environmental failures (such as no more space on disk). 9 10 KLEE only checks for low-level errors and violations of user-level asserts, while developer tests validate expected behavior. So, you can't say DA is better than manual tests. 11 12 Symbolic execution tools have significant trick. They run path before branch only once. Is it mean SE is better than Avalanche style? 13 KLEE can be used for verify functional equivalence using assert(f(x) == f'(x)). Think about using DA for this for different languages. 13 14 15 16
Sergey Vartanov, 2007–2020