Dr. Claude Diderich's PhD Thesis

Automatic Data Distribution for Distributed Memory Parallel Computers

Dr. Claude G. Diderich
Mülibachstrasse 49, CH-8805 Richterswil, Switzerland
E-mail: e-Mail

Abstract

In recent years the need for techniques to parallelize numerical applications, for instance in a numerical array computation, has increased. When parallelizing nested loops for distributed memory parallel computers, two major problems have to be solved: the scheduling of the loop iterations and the mapping of the computations and data elements onto the processors. The scheduling functions must satisfy all the data dependences existing in the sequential loop nests. The mapping functions must ensure that a given degree of parallelism is obtained. Furthermore they should minimize the amount of communication overhead due to non local data references.

In this thesis we study the alignment problem, that is, the problem of mapping computation and data onto the processors, in a linear algebra framework. We start by studying the communication-free alignment problem. We present and extend the algorithm for solving this problem introduced by Bau, Kodukula, Kotylar, Pingali, and Stodghill (1994). In a second step we introduce the constant-degree parallelism alignment problem. It is the problem of finding computation and data mapping functions that minimize the number of remote data accesses for a given degree of parallelism. The problem is shown to be NP-hard. An exact implicit enumeration algorithm is presented. It proceeds by enumerating all interesting subsets of alignment constraints to be satisfied. To allow large alignment problems to be solved we present an efficient heuristic and apply it to various benchmarks.

We study the relation between the scheduling and the alignment problem. Compatible combinations of alignment and scheduling functions are characterized. We present an algorithm to compute a valid multi-dimensional scheduling function for a given computation mapping function. This algorithm is used as a building block to find a pair of compatible scheduling and alignment functions subject to a given cost function.

Various approaches to the alignment problem are surveyed. The way by which the alignment problem is approached in this thesis is compared with related work. We show that our framework is more general than other approaches using a linear algebra framework.

ACM Categories and Subject Descriptors: C.1.2 [Processor architecture]: Multiple Data Stream Architectures (Multiprocessors) - Multiple-instruction-stream, multiple-data-stream processors (MIMD); D.1.3 [Programming techniques]: Concurrent Programming - Parallel programming; D.3.4 [Programming languages]: Processors - Compilers; D.3.4 [Programming languages]: Processors - Optimization; D.4.2 [Operating systems]: Storage Management - Distributed memories; F.1.2 [Computation by abstract devices]: Modes of Computation - Parallelism and concurrency; I.2.8 [Artificial intelligence]: Problem Solving, Control Methods, and Search - Backtracking; I.2.8 [Artificial intelligence]: Problem Solving, Control Methods, and Search - Heuristic methods.

General Terms: Algorithms, Languages, Performance, Theory.

Additional keywords and phrases: Alignment problem, automatic parallelization, computation mapping, data mapping, nested loops, scheduling functions.

If you interested in obtaining a paper or electronic copy of the thesis, please contact the author at the above mentioned e-mail address.