Speeding up dynamic compilation: concurrent and parallel dynamic compilation
Abstract
The main challenge faced by a dynamic compilation system is to detect and
translate frequently executed program regions into highly efficient native code
as fast as possible. To efficiently reduce dynamic compilation latency, a dynamic
compilation system must improve its workload throughput, i.e. compile
more application hotspots per time. As time for dynamic compilation
adds to the overall execution time, the dynamic compiler is often decoupled
and operates in a separate thread independent from the main execution loop
to reduce the overhead of dynamic compilation.
This thesis proposes innovative techniques aimed at effectively speeding
up dynamic compilation. The first contribution is a generalised region
recording scheme optimised for program representations that require dynamic
code discovery (e.g. binary program representations). The second contribution
reduces dynamic compilation cost by incrementally compiling several
hot regions in a concurrent and parallel task farm. Altogether the combination
of generalised light-weight code discovery, large translation units,
dynamic work scheduling, and concurrent and parallel dynamic compilation
ensures timely and efficient processing of compilation workloads. Compared
to state-of-the-art dynamic compilation approaches, speedups of up to 2.08
are demonstrated for industry standard benchmarks such as BioPerf, Spec
Cpu 2006, and Eembc.
Next, innovative applications of the proposed dynamic compilation scheme
to speed up architectural and micro-architectural performance modelling are
demonstrated. The main contribution in this context is to exploit runtime
information to dynamically generate optimised code that accurately models
architectural and micro-architectural components. Consequently, compilation
units are larger and more complex resulting in increased compilation
latencies. Large and complex compilation units present an ideal use case for
our concurrent and parallel dynamic compilation infrastructure. We demonstrate
that our novel micro-architectural performance modelling is faster than
state-of-the-art Fpga-based simulation, whilst providing the same level of
accuracy.