Implementing and optimizing a Sparse Matrix-Vector Multiplication with UPC
ForfatterLagraviere, Jeremie Alexandre Emilien; Prugger, Martina; Einkemmer, Lukas; Langguth, Johannes; Ha, Hoai Phuong; Cai, Xing
Programmability and performance-per-watt are the major challenges of the race to Exascale. In this study we focus on Partitioned Global Address Space (PGAS) languages, using UPC as a particular example. This category of parallel languages provides ease of programming as a strong advantage over the classic Message Passing Interface(MPI). PGAS has also advantages compared to classic shared memory programming (OpenMP), as by nature a PGAS program is meant to work on a single-node and multinode machine without changing the code. Our goal in this technical report, is to use UPC in order to implement a memory bound problem, which involves irregular inter-thread communication. To represent this problem we perform a SParse Matrix-Vector multiplication (SpMV) over unstructured data. We implemented different versions of the UPC-SpMV for different levels in the code complexity. In this technical report, we give a description of this various versions of the UPC-SpMV and a set of results using single-node and multi-node machine hardware scenarios.