Parallel SpGEMM

From Cs240aproject

(Difference between revisions)
Line 9: Line 9:
1) The simplest decomposition is the regular block checkerboard partitioning [http://www.parallab.uib.no/projects/molecul/matrix/main/node12.html]. Here we assume assumes the existence of a fast sequential algorithm for sparse matrix multiplication that is used as a block box.  
1) The simplest decomposition is the regular block checkerboard partitioning [http://www.parallab.uib.no/projects/molecul/matrix/main/node12.html]. Here we assume assumes the existence of a fast sequential algorithm for sparse matrix multiplication that is used as a block box.  
-
2) The second way to decompose to data is to try assigning equal number of edges to each processor. The initial idea is to map each non-zero of the matrix to a point in the rectangular (m x n) region and use a quadtree data structure. Here one may or may not use a sequential matrix multiplication as block box.
+
 
 +
2) The second way to decompose to data is to try assigning equal number of edges to each processor. The initial idea is to map each non-zero of the matrix to a point in the rectangular (m x n) region and use a quadtree data structure. Here one may or may not use a sequential matrix multiplication as block box [Yet to be explored].
 +
 
 +
3) The third and the hardest way is to assign edges to processors such that the following metric is minimized.
 +
  <math>min{max_i{communication(i) + computation(i)}}</math>

Revision as of 08:47, 9 May 2007

This is the Spring 2007 project of Aydin Buluc and Fenglin Liao.




Project Description

The project is to write a 2D parallel implementation of the sparse matrix times sparse matrix routine. It is projected to be implemented in C++ using MPI. The project will have multiple iterations:

1) The simplest decomposition is the regular block checkerboard partitioning [1]. Here we assume assumes the existence of a fast sequential algorithm for sparse matrix multiplication that is used as a block box.

2) The second way to decompose to data is to try assigning equal number of edges to each processor. The initial idea is to map each non-zero of the matrix to a point in the rectangular (m x n) region and use a quadtree data structure. Here one may or may not use a sequential matrix multiplication as block box [Yet to be explored].

3) The third and the hardest way is to assign edges to processors such that the following metric is minimized.

 Failed to parse (Can't write to or create math temp directory): min{max_i{communication(i) + computation(i)}}
Personal tools