S2E: A Platform for In-Vivo Multi-Path Analysis of Software Systems
1 In-vivo means with environment (in-vitro means without) because they perform both kernel-mode and user-mode binary analysis. Two main ideas: selective symbolic execution (S2E, SSE) and relaxed execution consistency model. Conceptually, they have one path explorer and modular path analyzers. It is a framework for analysis tools construction:
- reason about entire families of paths,
- take in account whole environment,
- 2 operate on binary level.
Also tool should have selectors to specify path of interest (e.g. paths that touch a specific memory object, or paths influenced by a specific parameter). S2E also has Windows VM. They use term symbolic execution tree as owr path tree. Following that tree they copy program state at every branch. But copy-on-write is used to make this process efficient. 3 S2E turns multi-path mode off whenever possible. When do this, it converts symbolic variable x to some concrete value (solving constraints). Here execution consistency model needed. When go back to symbolic execution, you replace concrete values with symbolic variables but may add some constraints (e.g. x ⩽ 15) from knowledge about environment. They use solving only when go from symbolic to concrete? 4 Whenever you select concrete value for symbolic variable you can go back and change this value to another.
Execution consistency
In normal execution program state is always consistent (exists a feasible path from start to this point). But, they said, for many analyses this assumption is unnecessarily strong. (Like in unit testing, when you rely on unit interface, not all possible values.) So, there are paths that
- statically feasible, they include
- locally feasible, they include
- globally feasible.
- Strict consistency (SC):
- SC-CE: just concrete execution, no internal structure got, black-box random fuzzing.
- SC-UE: collecting info (e.g. PC) while executing the unit. Environment is black-box. DART, KLEE.
- SC-SE: info about whole system.
- Local consistency (LC): replace result of environment calls with symbolic values with valid constraints.
- Relaxed consistency (RC):
- RC-OC: relax some constraints at unit/environment boundary.
- RC-CC: relax some constraints anywhere–provided by static analysis.
They said, our concretization is “blind”, but their is not. 6 Their engine allows to implement any of these engines. 8 S2E is based on KLEE and QEMU. 9 For saving and restoring states S2E uses QEMU's snapshot mechanism.
3 tools on top of S2E:
- testing of drivers. 10 720 LOC of C++ code. Plugins: CodeSelector
- reverse engineering of drivers,
- performance profiling.
11 S2E executes 1000 times more instructions in concrete mode than in symbolic mode. 12 Parsers are the bane of SE engines, so S2E just skips parsing stage.
Sergey Vartanov, 2007–2020