CNIC made new progress in the research and development of Sparse Matrix-Matrix Multiplication (SpMM) Operators.
Sparse Matrix-Matrix Multiplication (SpMM) is widely used in fields such as graph neural networks (GNNs), sparse linear algebra solvers, and scientific simulations. SpMM accounts for 30% to 50% of the total computation time in large-scale simulations and engineering computations, and up to 50% to 80% in GNNs, making it a critical operator in both scientific computing and deep learning.
Tensor Core serves as a core computational unit for matrix multiplications on GPUs currently. However, challenges such as significant storage overhead for sparse matrices, irregular memory access patterns, and insufficient overlap between data loading and computation hinder its performance in SpMM scenarios.
To address these challenges, researchers from the AI Departments of CNIC developed the Acc-SpMM algorithm library, accompanied by a series of systematic optimization strategies. These include novel sparse matrix compression storage formats, data reordering algorithms, pipelining, and adaptive load balancing mechanisms. Experimental results demonstrate that Acc-SpMM outperforms cuSPARSE across various datasets.
This work has been accepted by the ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 25), a highly prestigious conference (CCF-A). The research was supported by the National Key Research and Development Program of China (Grant No. 2023YFB3002100) and the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB0500103).
The overview of Acc-SpMM
The reordering method of Acc-SpMM
The first author of the paper is Haisha Zhao, a master student at CNIC, under the supervision of Researcher Chunbao Zhou. The corresponding author is Jue Wang, a senior engineer at CNIC. Master student San Li is listed as a co-first author.
Related Achievements:
Haisha Zhao, San Li, Jiaheng Wang, Chunbao Zhou, Jue Wang, Zhikuang Xin, Shunde Li, Zhiqiang Liang, Zhijie Pan, Fang Liu, Yan Zeng, Yangang Wang. Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores, ACM SIGPLAN Annual Symposium Principles and Practice of Parallel Programming; PPoPP’25