Irreg-Simd

Artifact of paper "Exploiting Recent SIMD Architectural Advances for Irregular Applications"

View the Project on GitHub lcchen008/irreg-simd

About

This code repository is used for your reference while you read the paper: Exploiting Recent SIMD Architectural Advances for Irregular Applications, which is published in 2016 International Symposium on Code Generation and Optimization (CGO 2016).

The code has passed the Artifact Evaluation of CGO 2016. The code base includes both data preparation and program execution, automated by bash scripts. Please read EXECUTION_INSTRUCTION.pdf for the instructions on how to run our code.

Authors

What the paper is about

A broad class of applications involve indirect or datadependent memory accesses and are referred to as irregular applications. Recent developments in SIMD architectures – specifically, the emergence of wider SIMD lanes, combination of SIMD parallelism with many-core MIMD parallelism, and more flexible programming APIs – are providing new opportunities as well as challenges for this class of applications. In this paper, we propose a general opti- mization methodology, to effectively optimize different subclasses of irregular applications. Based on the observation that all applications with indirect memory accesses can be viewed as sparse matrix computations, we design an optimization methodology, which includes three sub-steps: 1) locality enhancement through tiling, 2) data access pattern identification, and 3) write conflict removal at both SIMD and MIMD levels. This method has been applied to unstructured grids, molecular dynamics, and graph applications, in addition to sparse matrix computations. The speedups achieved by our single threaded vectorized code over serial code is up to 9.05, whereas the overall speedup while utilizing both SIMD and MIMD (61 cores in Intel Xeon Phi) with our approach is up to 467.1. Further optimization using matrix reordering on irregular reductions and graph algorithms is able to achieve an incremental speedup of up to 1.69, though at a relatively high preprocessing cost. Moreover, SpMM using our approach outperforms routines from a highly optimized commercial library by up to 2.81x.