Parallel SpGEMM

From Cs240aproject

Revision as of 05:31, 16 October 2007 by Aydozz (Talk | contribs)

This is the Spring 2007 project of Aydin Buluc and Fenglin Liao.

Download Final Paper from here (pdf format)(ps format)

Download Final Presentation from here (ppt format)

Download Progress report from here (ps format)



Contents

Project Description

The project is to write a 2D parallel implementation of the sparse matrix times sparse matrix routine. It is projected to be implemented in C++ using MPI. The project will have multiple iterations:

1) The simplest decomposition is the regular block checkerboard partitioning and to use a SUMMA-like algorithm on top of that. Here we assume the existence of a fast sequential algorithm for sparse matrix multiplication that is used as a block box.

2) The second way to decompose to data is to try assigning equal number of edges to each processor. The initial idea is to map each non-zero of the matrix to a point in the rectangular (m x n) region and use a quadtree [1] data structure. Here one may or may not use a sequential matrix multiplication as block box [Yet to be explored].

3) The third and the hardest way is to assign edges to processors such that the following metric is minimized.

 Failed to parse (Can't write to or create math temp directory): min(max_i(communication(i) + computation(i)))


In other words, we try to load balance across the processors. The load balancing is generally done using graph partitioning [2] [3] but it is unclear how to do it for Parallel SpGEMM. Whether this load balancing is going to be done statically or dynamically is yet to be determined.



Resources

Boost.MPI [4] Note that the data pointed by the pointers in a class are transmitted automatically due to the 4th requirement of the serialization library, which is "Deep pointer save and restore. That is, save and restore of pointers saves and restores the data pointed to"

Boost Build v2 [5]

Serizalization of Derived Pointers

SDSC Resources [6]


Caveats

1) When implementing MPI with threads, you will be oversubscribing (running more processes than processors) the nodes without OpenMPI figuring this out. For example, if each of your processes has 3 threads, where 2 of these contain MPI_SEND, MPI_RECV operators, you would want your thread to yield() if it blocks on a MPI_RECV or MPI_SEND. However, by default, MPI will run on aggressive mode when the number of slots = the number of mpi_processes [7].

So, you wanna set mpi_yield_when_idle manually to 1 when oversubscribing implicitly.

"mpirun --mca mpi_yield_when_idle 1 -np 16 ./testpar"


Warning: This degrades the performance when processors = threads !!!

mpi_yield_when_idle 1 --> 4.13 sec mpi_yield_when_idle 0 --> 2.95 sec


Installing BOOST.MPI to DataStar

1) Download boost into Project directory.

2) Make sure mpCC works.

3) Go to "../boost/tools/jam/src" folder

4) Type "./build.sh vacpp"

5) Go to "../boost/tools/jam/src/bin.aixppc" folder and copy the "bjam" executable to "../boost" directory. (i.e. top-level boost directory)

6) Copy "../boost/tools/build/v2/user-config.jam" to $HOME and add line "using mpi; "

7) "using mpi; " will probably fail. Thus you might need to configure MPI yourself. In order to do that, you need to know which libraries are related to mpi. Such libraries are inside the PE (parallel environment) folder of dspoe: "/usr/lpp/ppe.poe/lib"

using mpi : : <find-shared-library>library1 <find-shared-library>library2 <find-shared-library>library3 ;

8) Type "bjam --with-mpi --toolset=vacpp" in your top-level boost directory.

to see what is going on: "bjam --with-mpi --toolset=vacpp --debug-configuration 2 > debugconferr.txt"



== Accessing source ==

svn checkout svn+ssh://gauss.cs.ucsb.edu/home/aydin/CS240/SVN

Personal tools