This paper studies a special instance of the graph partitioning problem motivated by an application in parallel processing. When a parallel computation is represented by a weighted task graph, we consider the problem of mapping each node in the graph to a processor in a linear array. We focus on a particular type of computation, a grid structured computation (GSC), where the task graph is a grid of nodes.
The general task graph mapping problem is known to be intractable, and thus past research efforts have either proposed heuristics for the general problem or optimally solved a constrained version of the general problem. Our contributions in this paper fall into both categories. We weaken past constraints and optimally solve a less constrained problem than has been solved optimally before and also present and analyze a simple greedy heuristic.
Optimal solutions have been given in the past when one places the contiguity constraint that each partition must consist of entire columns (or rows) of the GSC. We show that a more efficient solution can be found by relaxing the constraints on the partitions to allow parts of consecutive columns to be mapped to a processor; we call this weaker contiguity constraint the part-column constraint.
Our first result is to show that the problem of finding an optimal mapping satisfying the contiguity constraint remains NP-complete, where the contiguity constraint simply requires adjacent nodes to be mapped to the same or adjacent processors. We then design an \(O(M^2p)\) algorithm (that uses \(O(Mp)\) space) which finds the optimal part-column partitioning of a grid of \(M\) modules to a linear array of \(p\) processors. A simple greedy \(O(M)\) heuristic part-column partitioning algorithm is also presented which performs within a constant factor (two) of the optimal algorithm.
Our loosening of past constraints is shown to lead to a forty percent improvement in some cases. Other experimental results compare the proposed heuristic with the optimal algorithm.