Scarlet Serenade

Static Analysis

Control Flow Graphs (CFG)

Construction

Nodes (Basic Blocks

INPUT: a sequencce of three-address instructions of P OUTPUT: a list of basic blocks of P METHOD:

  1. determine the leads in P
    • the first instruction in P is a leader
    • any target instruction of a conditional or unconditional jump is a leader
    • any instruction that immediately follows a conditional or unconditional jump is a leader
  2. build BBs for P
    • a BB consists of a leader and all its subsequent instructions until the next leader

Edges

  • There is a conditional or unconditional jump from the end of A to the beginning of B
  • B immediately follows A in the original order of instructions and A does not end in an unconditional jump

Data Flow Analysis

设数据流分析的值域是 ,可定义一个 k-tuple: 。是集合 的一个元素,表示每次迭代后 个节点整体的值。

每一次迭代可看作是 映射到新的 ,通过转换规则和控制流来映射,记作函数

通过不断迭代,直到相邻两次迭代的 k-tuple 值一样,算法结束。

数据流分析可以看做是迭代算法对格点 利用转换规则和 meet/join 操作。

Data Flow Analysis - Applications

Each node is associated with a Transfer Function according to the semantics of statements.
Each program point is associated with a data-flow value that represents an abstraction of the set of all possible program states that can be observed

  • Forward Analysis: OUT[stm] = f(IN[stm])
  • Backward Analysis: IN[stm] = f(OUT[stm])

Data Flow analysis is to find a solution to a set of safe-approximation directed constraints on the IN[stm]'s and OUT[stm]'s, for all statements stm

Control Flow's Constraints

  • Control flow within a Basic Block: for all
  • Control flow among Basic Blocks:
    • &
    • for Forward Analysis:
      • a predecessor of
    • for Backward Analysis

大多数情况下,optimization application 都需要一个 conservative approximations. 如果我们拿到了错误的信息,那么我们的优化就可能是 unsound,并且会影响到程序本来的语义。

一个 canonical choice 是为每个名为 的变量 introduce a target 为每个 allocation site introduce a target 其中 i 是一个唯一的 index

points to analysis 发生在 syntax tree 上,因为发生在 control flow analysis 之前或同时进行。 points to analysis 的结果是一个函数 返回 set of possible pointer targets。 如果我们想知道两个指针 & 是否可能是 aliases,那么一个安全的做法就是比较其交集

Andersen’s Algorithm

对每个 variable named ,用集合 表示所有可能的 pointer targets

分析假设程序已经倍 normalized,也就是说 pointer manipulation 仅限于以下几种 1 ,生成 constraints: 2 ,生成 constraints: 3 ,生成 constraints: 4 ,生成 constraints: 5 ,生成 constraints: 6 ,生成 constraints: which 可以安全的被忽略

最后我们得到 5 种 constraints,最后解这些约束就得到算法的结果

Steensgaard’s Algorithm

另一个更粗略度一点的算法,by viewing assignments as being bidirectional.

TODO

解这些约束就得到算法的结果,The resulting points-to function is defined as:

Interprocedural Points-To Analysis

函数指针也可能有 indirect references,我们需要同时做 control flow analysis and the points-to analysis。例如:(***x)(1,2,3);

我们可以对程序做一个简化,使得 function calls 总是形如 id1 = (id2)(a1, ..., an);。类似的 all return expressions are assumed to be just variables.

到目前为止我们都只是把 heap 看作一个 amorphous 结构,几乎只关注了 stack based vars. 我们可以用 shape analysis 对堆进行更细致的分析。

Shape graphs 是一个有向图,其每一个节点都是一个 pointer targets。Shape graphs 的 order 根据 inclusion of their sets of edges 定义。Thus, is the graph without edges and is the completely connected graph.
pointer targets 表示在执行期间可能创建的 memory cells,边表示两个 cell 间可能包含一个引用。