Syllabus

OverviewIntra-procedural dataflow analyses. Virtually all data-flow analyses are formulated as fixed-point iterations that propagate static program abstractions through a control-flow graph using flow functions over an abstract domain. The so-called monotone framework defines a unified conceptual framework that allows program analysts to define their analyses in a template-driven style. We will discuss how this framework works, and why it works and what constraints it imposes on the flow functions the user needs to supply. (For instance, to avoid infinite recursions, the definitions need to adhere to the ascending-chain condition.)

Off-the-shelf call-graph and pointer analyses. We will first talk about how the monotone framework can be used to solve intra-procedural data-flow analysis problems, i.e., analyzing one single method at a time. Such analyses have their limitations but they have the advantage of being easy to write, fast to execute and on top of that they can be computed fully modularly. To be able to extend intra-procedural analyses to inter-procedural analyses, i.e., analyses that operate on the whole program, one requires a so-called call graph which represents the programs calling structure. We will thus explain current call-graph construction algorithms, their properties and relative tradeoffs. As students will learn, all precise call-graph analyses for object-oriented languages require a points-to analysis as well, which decides which types of objects a program variable may point to. We will thus be discussing important points-to analysis algorithms as well. The algorithms we discuss at this point are “off-the-shelf” algorithms that can be used for many purposes, only one of which is call-graph construction.

Inter-procedural dataflow analyses. Next we will discuss how to extend the monotone framework to conduct inter-procedural analysis problems using a call graph, assuming this call graph was constructed ahead of time. This involves mainly the passing of parameter and return values. We will briefly hint at complications arising from unstructured inter-procedural flows such as through exceptions or multi-threaded executions. More importantly, though, students will learn that inter-procedural analyses in the traditional monotone framework has the inherent limitation that it cannot distinguish realizable paths from unrealizable paths. We will discuss how this problem can be addressed by taking context information into account.

Call-strings approach vs. Functional approach. One commonly distinguishes two different classes of approaches to conducting context-sensitive inter-procedural data-flow analyses. The call-strings approach matches calls and returns by tagging data-flow values with a call string, while the functional approach computes summary functions for methods that are then re-applied in each calling contexts. Both approaches have their tradeoffs which we will discuss in detail. First, will talk about the (more traditional) call-strings approach.

Call strings and value-based termination. The call-strings approach has the advantage of being very generally applicable and being relatively easy to understand, however it also has the disadvantage of being very expensive to compute. The problem is that programs can expose a number of calling contexts that is exponential in the size of the call graph (which is already large). We will discuss how value-based termination can limit the number of calling contexts to consider, without any loss of precision. Next, we switch to discussing the functional approach…

IFDS and IDE. Reps, Sagiv and Horwitz have developed two frameworks for the particularly efficient inter-procedural context-sensitive data-flow analysis using the functional approach. IFDS solves Inter-procedural Finite Distributive Subset Problems in which the analysis domain is finite (albeit possibly large) and in which the merge operator must be set union. Within these constraints, IFDS defines a very efficient solution algorithm. by reducing the program-analysis problem through a pure graph-reachability problem. This is achieved by not only encoding the program but also the flow function in a graph representation. IDE extends IFDS by computing distributive functions as values, which are then, in a second step, applied to a special value domain. IDE is more expressive than IFDS and can thus be used to solve certain classes of problems more precisely and/or efficiently than a pure IFDS implementation.

SPLlift. As a great example showcasing the difference of IFDS and IDE we will showcase SPLlift, a framework for analyzing software product lines developed at TU Darmstadt. SPLlift takes an IFDS-analysis for regular Java programs as an input but then converts this analysis into a version that can automatically be applied to entire Java-based product lines. This converted IFDS solver is, in essence, an IDE solver.

Value contexts. In early 2013, researchers demonstrated a novel approach to inter-procedural static analysis that combines the generality of the call-strings method with some of the efficiency advantages from the functional approach. We will discuss this new method and its properties in detail.

Problem of context reification. One property of the functional approach and also value contexts is that context strings (in terms of lists of method signatures) are never made explicit. On the one hand this is a valuable property, as it makes the approaches relatively memory-efficient. On the other hand, one seldom discussed problem with this property is that many context-sensitive off-the-shelf pointer analyses require a context string as input. As a result, when using the functional approach, one often cannot easily use such pointer analyses without further and potentially runtime-expensive modifications.

Integration of demand-driven pointer analyses. One solution to the problem is to not at all execute an off-the-shelf pointer analysis ahead of time, but to rather use a demand-driven pointer analysis that runs in lockstep, i.e., through an indirect recursion, together with the regular inter-procedural data-flow analysis. Such alias analyses can be formulated as IFDS/IDE problems themselves, which makes it possible to execute them within the same IFDS/IDE solver. One important difference between demand-driven alias analyses and ahead-of-time analyses is that the former usually model aliases through so-called access paths, which the latter model aliases through allocation sites. From the analyses’ respective points of view both formulations are natural but the access-path formulation does have some advantages. We will discuss those advantages in detail. In addition, we will be discussing the need for so-called object-sensitive analyses.

Problem of path matching. One problem with the above demand-driven approach is that multiple analyses execute within the same solver. When doing so, it is advisable to assure that both analyses will consistently visit only the same paths, as else one will have to deal again with false results caused by infeasible paths – a problem that context-sensitive analyses are actually trying to avoid.

Current and “eternal” limitations. Although static analysis is a very powerful tool, and inter-procedural analysis in particularly so, there are some important theoretical and practical limitations that students should be aware of when using static analyses out in the field. Some limitations are “eternal” such as the non-ability to deal with reflection, dynamic class loading and eval. Other limitations, especially regarding particularly client analyses, are at the forefront of current research and may be lifted in the future. We will discuss such limitations and approaches that try to at least ease the problems they cause.