Implementing and optimizing a Sparse Matrix-Vector Multiplication with UPC
Author
Lagraviere, Jeremie Alexandre Emilien; Prugger, Martina; Einkemmer, Lukas; Langguth, Johannes; Ha, Hoai Phuong; Cai, XingAbstract
Programmability and performance-per-watt are the major challenges
of the race to Exascale. In this study we focus on Partitioned Global
Address Space (PGAS) languages, using UPC as a particular example. This
category of parallel languages provides ease of programming as a strong advantage
over the classic Message Passing Interface(MPI). PGAS has also
advantages compared to classic shared memory programming (OpenMP),
as by nature a PGAS program is meant to work on a single-node and multinode
machine without changing the code. Our goal in this technical report,
is to use UPC in order to implement a memory bound problem, which involves
irregular inter-thread communication. To represent this problem we
perform a SParse Matrix-Vector multiplication (SpMV) over unstructured
data. We implemented different versions of the UPC-SpMV for different
levels in the code complexity. In this technical report, we give a description
of this various versions of the UPC-SpMV and a set of results using
single-node and multi-node machine hardware scenarios.