January 29, Tuesday
12:00 – 13:00
Revisiting Data Dependencies and their Applicability to Parallelization of Modern Software
Computer Science seminar
Lecturer : Omer Tripp
Affiliation : School of Computer Science, Tel Aviv University
Location : 202/37
Host : Dr. Danny Hendler
Parallelizing compilers are known to work well for scientific
computations, such as matrix and vector manipulations. However, modern
software has long moved on. The current landscape of applications
written in high-level languages, like Java and C#, introduces new and
important challenges. The compiler needs to handle multiple levels of
abstraction, extensive pointer-based computations over dynamic memory,
unbounded data structures, and deployment-specific program behaviors.
Data dependence analysis – tracking whether statements in the
sequential code depend on each other due to interfering accesses to
some shared memory location – has been the foundation of parallelizing
compilers. The fundamental observation, going (at least) 30 years
back, is that two code blocks that have no (transitive) data
dependencies can be executed in parallel, resulting in the same final
state as running the codes sequentially. For all the challenges listed
above, only in rare cases are parallelizing compilers able to prove
this precondition: The candidate code blocks are often dependent, and
even if not, the compiler's (static) dependence analysis is typically
too conservative to prove independence, failing due to spurious dependencies.
I will propose a new view of data dependencies, showing that they
remain a useful tool for reasoning about parallelism, albeit in
different ways and different form. I will show how data dependencies
can be lifted to higher levels of abstraction, per the abstractions
governing the structure of the application, as well as how effective
use can be made of partial yet precise dependencies to guide the
behavior a parallelization system while preserving its correctness
(i.e. serializability guarantees). This can be done in more than one
way, including (i) building specialized, client-specific
conflict-detection oracles, (ii) synthesizing concurrency monitors
that predict the available parallelism per input data and/or computation phase, and (iii) finding true, semantic dependencies that limit parallelism.