Abstraction and introduction
importance of binary analysis:
- 很多情况下只用运行才能发现代码的某些特性。
- 解释型语言需要binary program 解释或 just-in-time 来及时编译。(都需要binary)
- OS核心以及一些需要考虑效率的应用都是用C/C++实现的。
- 很多网络上的应用资源都被限制,其基础框架都需要用C/C++来实现。
- compiler and tool chains are not bug-free。
low-level language所带来的unsecurity的严重性:
- 尽管有许多措施来防止,buffer overflow仍旧是是最容易见到的software flaw之一。
- memory corruption vulnerabilities 是剩余的最容易见到的flaw部分。
- 不光在PC 端software,远程应用如smartlocks,pacemarks和手机都是饱受折磨的对象。
problem of current binary analysis techniques:
- many implementations of binary analysis techniques begin and end their existence as a research prototype/. (waste time for startup cost)
- the system is often unavailable to the public. reproduce the system is time-wasting and replicating results is impractical.
- comparations between different implementation details and different evaluation datasets are not persuasive.
contribution:
- reproduce many existing approaches in offensive binary analysis in a single coherent framework.
- show the difficulties of combining diverse binary analysis techniques and applying them on a large scale.
- open source for future use.
- 使用统一的框架,同意比较标准DARPA CGC。
automated binary analysis
本章节说明了需要克服的困难以及为什么DARPA是一个好的比较标准。
trade offs
Replayability, Semantic insight
Replayability
- high replayability -> low coverage.
- low replayability -> high false positives
Semantic insight
- semantic insight -> large amount of data (dynamic analysis)
- simpler data domains -> low semantic insight (static analysis)
replayability + semantic insight = problem of scability.
Environment model: modern OS is extremely complex so that semantic analysis is difficulit.
The DARPA Cyber Grand Challenge
特点:
- create a symple OS as environment specifically fo the CGC.
- binaries have a wide range of complexity.
BACKGROUND: static vulnerability discovery
categories: graphs VS data
drawback: not replayable, operate on simpler data domains->low semantic insite.
result: high rate of false positives.
Recovering Control flow(CFG)
two properties of CFG recovery analysis: soundness(true positive rate) and completness(inverse of false positive rate).
Graph-based vulnerability discovery(program property graphs): build a model of a bug, represented by a set of nodes in a control-flow or data-dependency graph. However, only pre-existing vulnerability will be discovered.
vulnerability detection with data modeling
value-set analysis: approximate the state of program at any given point.
BACKGROUND: dynamic vulnerability discovery
concret and symbolic execution.
high replayable but vary in terms of semantic insight.
dynamic concrete execution
fuzzing
malformed input is provided to trigger a crash. suffer from test-case requirement.
- coverage-based fuzzing: maximize the amount of code executed. problem: lack of semantic insight(what part of input).
- taint-based fuzzing: unaware of how to mutate this input.
dynamic symbolic execution
track the state of registers and memory. whenever a condition branch is reached, execution forks and follows both path.
- classical dynamic symbolic execution. performing path exploration until a vulnerable state is identified. problem: scability, bugs found are shallow.
- symbolic assisted fuzzing: use in-depth understanding of analyzed program to mutate inputs.
- under constrained symbolic execution: execute only part of application. problem: not possible to ensure a proper context->false positive. no replayability of the bugs.
BACKGROUND: exploitation
goal: reproducing an identified crash. automatically generating the exploit to verify the security impoact of the crash and hardening the exploit to make it rsilient in the presence of modern mitigation techniques.
简单来讲,这个section会介绍一些techniques,其非但可以找到漏洞,还可以复现漏洞并且延申到现代防御系统中。
crash reproduction
(possible improvement:现代分析手段通常都是在非现实环境中运行的。如fuzzy会hard-code一些source of randomness。原因:1. assumption that same impute will resule same result。2. modeling randomness in other techniques is not well-explored research area)
两种类型:missing data 和 missing relationships
missing data:guess correct response -> not replaible. (possible improvement: not implemented in angr)
missing relationships: low semantic insight are unable to recoverthe relationships between data retrieved. (replayer will compute the preconditions on program path.)
exploit generation
automatically convert crashing input into an exploit for the application.
exploit hardening
non-executable stack regions and address space layout randomization have severely reduced the efficacy of traditional exploits.
automatically harden the exploits generated using current techniques against such defenses.
translate shellcode based exploit into an equivalent exploit utilizing return oriented programming.
Analysis engine
终于到了重头戏,angr的设计和实现。
设计部分主要说了几个设计目标:
- cross architecture support
- cross platform support
- support for different analysis paradigms: provide different types of memory models as well as data domains.
- usability
然后,paper中吐槽了不同的实现难度很难估计(possible improvement?)
之后是冗长的子模块和open source release。略过。
IMPLEMENTATION: CFG recovery
CFGAccurate
CFGAccurate结合了forced execution,lightweight backward slicing,symbolic execution 和value set analysis。(possible improvement: effectiveness?)
CFGFast
function identification,recursive disassembly he indirect jump resolution。快速的recover a cfg with a high coverage without a concern for understanding the researchability of functions.
IMPLEMENTATION: value set analysis
VSA 分析主要是通过value set abstract domain来估计每一时刻registers的值从而分析crah。
本篇paper做了一些提升:
- Creating a discrete set of strided-intervals
- Applying an algebraic solver to path predicates
- Adopting a signedness-agnostic domain
IMPLEMENTATION: under constrained symbolic execution
IMPLEMENTATION: symbolic assisted fuzzing
IMPLEMENTATION: crash reproduction
IMPLEMENTATION: exploit
IMPLEMENTATION: exploit hardening
Comparative evaluation
由于使用同种引擎且分享90%的相同代码,比较是有意义的。
CFG recovery
CFGAccurate,CFGFast VS IDA Pro 6.9
possible improvement: ground truth CFG information is not available.
using relative number of recovered basic blocks and control flow transfers between the results of IDA and our CFG recovery.
completeness: CFGFast 比IDA找出了更多的block和edges。(CFGAccurate?)
reachability information: CFGAccurate does not leverate ad hoc heuristics -> worse result than IDA. (actually, in the table, both result and time are worse than IDA: possible improvement)
Evaluation of vulnerability analysis techniques
under-constrained symbolic execution would result in a large number of false positives. result are not replayable.
Static buffer overlap detection: also not replayable and suffer from false positives.
possible improvement: the improvement of static analysis techniques on real binaries appears to be an area in need of research attention.