Mixed-Parallel Heterogenous Partitioning

Mixed Parallel Heterogeneous Partitioning

PI’s: Dirk Grunwald and Jeremy Siek
Students: Joseph Blomsted
Many complex applications contain mixed parallel behavior that can be expressed as a combination of task, data, and pipeline parallelism. To achieve the best performance, these applications must be carefully mapped to each target platform; the capabilities of the target system greatly influence the ratio of mixed-parallel behavior that should be exploited. Heterogeneous systems also require additional decisions concerning how code is scheduled across the available devices, as well as balancing the interdependence between device scheduling and parallelization strategy selection.
Conceptually, the program decomposition process occurs in three high-level phases. This holds true whether the decomposition is performed manually by a programmer or automatically through a parallel computing framework or compiler.  First, code regions must be identified that have the appropriate granularity and execution characteristics to benefit from parallel execution. Then, a parallelization strategy (none, task, data, or pipeline) must be selected for each code region or group of related regions. Finally, the code regions must be converted into parallel code and scheduled to the available resources, either as static threads or entities within a parallel runtime system. This is often a challenging task because the appropriate parallelization strategy, granularity, and degree of nested parallelism for a region of code depends on the capabilities of the target system. The challenge is increased for heterogeneous parallel systems, which augment general-purpose computers with GPUs and other accelerators. These systems necessitate the additional task of scheduling regions across the different devices, and further complicate the parallelization strategy and granularity decisions
that are influenced by this schedule.
The Mixed-Parallel Heterogeneous Partitioning (MPHP) project  is building a new program decomposition and scheduling algorithm that partitions a program into appropriately sized regions that are scheduled across the available resources using a mix of the three parallelization strategies. Unlike existing techniques, the presented algorithm does not evaluate the different parallelization strategies in a fixed order, but treats all three strategies equally.  Furthermore, the system addresses the interdependence between device scheduling and strategy selection, reducing both decisions to a common problem that allows both issues to be explored simultaneously.