Parallel SpGEMM
From Cs240aproject
Line 8: | Line 8: | ||
The project is to write a 2D parallel implementation of the sparse matrix times sparse matrix routine. It is projected to be implemented in C++ using MPI. The project will have multiple iterations: | The project is to write a 2D parallel implementation of the sparse matrix times sparse matrix routine. It is projected to be implemented in C++ using MPI. The project will have multiple iterations: | ||
- | 1) The simplest decomposition is the regular block checkerboard partitioning [http://www.parallab.uib.no/projects/molecul/matrix/main/node12.html]. Here we assume | + | 1) The simplest decomposition is the regular block checkerboard partitioning [http://www.parallab.uib.no/projects/molecul/matrix/main/node12.html]. Here we assume the existence of a fast sequential algorithm for sparse matrix multiplication that is used as a block box. |
2) The second way to decompose to data is to try assigning equal number of edges to each processor. The initial idea is to map each non-zero of the matrix to a point in the rectangular (m x n) region and use a quadtree [http://en.wikipedia.org/wiki/Quadtree] data structure. Here one may or may not use a sequential matrix multiplication as block box [Yet to be explored]. | 2) The second way to decompose to data is to try assigning equal number of edges to each processor. The initial idea is to map each non-zero of the matrix to a point in the rectangular (m x n) region and use a quadtree [http://en.wikipedia.org/wiki/Quadtree] data structure. Here one may or may not use a sequential matrix multiplication as block box [Yet to be explored]. |
Revision as of 22:51, 9 May 2007
This is the Spring 2007 project of Aydin Buluc and Fenglin Liao.
Project Description
The project is to write a 2D parallel implementation of the sparse matrix times sparse matrix routine. It is projected to be implemented in C++ using MPI. The project will have multiple iterations:
1) The simplest decomposition is the regular block checkerboard partitioning [1]. Here we assume the existence of a fast sequential algorithm for sparse matrix multiplication that is used as a block box.
2) The second way to decompose to data is to try assigning equal number of edges to each processor. The initial idea is to map each non-zero of the matrix to a point in the rectangular (m x n) region and use a quadtree [2] data structure. Here one may or may not use a sequential matrix multiplication as block box [Yet to be explored].
3) The third and the hardest way is to assign edges to processors such that the following metric is minimized.
Failed to parse (Can't write to or create math temp directory): min(max_i(communication(i) + computation(i)))
In other words, we try to load balance across the processors. The load balancing is generally done using graph partitioning [3] [4] but it is unclear how to do it for Parallel SpGEMM. Whether this load balancing is going to be done statically or dynamically is yet to be determined.