INTRODUCTION
The advent of large multi-core systems has renewed interest in programming models that simplify parallel programming. Recursive parallel programming models, as exemplified by Cilk [4], put forth a simple concurrency model. A
programmer divides the given work into smaller work units
whose results are recursively combined to provide the final
result. This divide-and-conquer strategy simplifies programmer’s effort to specifying concurrency. Often, a concurrencyannotated version of a sequential recursive program (a principle referred to as “serial elision” in Cilk).
However, this divide-and-conquer strategy can impose additional constraints on concurrency in the program. Specifically, all work represented by two work units are ordered if
any operation in one unit depends on any operation in the
other. In general, Cilk-like programming models represent
the computation as a series-parallel directed acyclic graph
(dag). Computations whose dependence structure is modeled by a more general dag need to be embedded into an
SP-dag by introducing additional edges. This can significantly limit available parallelism and thus scalability.
In this paper, we try to answer the question: is it possible
to achieve the best performance for arbitrary computation
dags while retaining Cilk’s simplicity? We present an approach to speculatively recover the concurrency lost due to
a recursive parallel program specification. We exploit several
characteristics of the structured expression of concurrency in
Cilk programs. In many Cilk computations, the dag structure is independent of the problem size, enabling us to capture data-independent concurrency constraints efficiently1.
Unlike speculation for loop programs, relaxing such concurrency constraints in Cilk exposes significant additional
work and concurrency therein, which, in turn, greatly increases the benefits achievable from optimistic parallelization. Finally, scalable Cilk programs employ coarse-grained
base cases that can be efficiently annotated to track data
accesses and detect conflicts.
We design a runtime system that enables speculative execution of Cilk programs. We present schemes that employ
increasing degrees of speculation to explore additional opportunities for relaxing concurrency constraints. These include single-task, multi-task (a.k.a. deep), and parallel speculation. We develop a predictor to efficiently and accurately
identify opportunities for speculation while considerably reducing the number of mis-speculations. We design a userlevel API to annotate Cilk programs and achieve efficient
data versioning and conflict detection under speculation.
We evaluate the various schemes and demonstrate that
speculation can significantly reduce idle times induced by
concurrency constraints and improve upon original Cilk programs. We demonstrate that the speculation framework incurs low space and time overheads. In addition, we show
that speculation incurs low overheads in the absence of profitable speculation opportunities. Evaluation of the speculation predictor shows it can precisely identify speculation
opportunities and keep mis-speculations low.