Location:Home >> Research >> Research Progress

Research progress in efficient 3D convolutional algorithms

Date: Sep 04, 2024

The emergence of High-Performance Computing and Artificial Intelligence has significantly expanded the applications of three-dimensional convolutional neural networks (3D CNNs). However, the complex 3D convolution remains a primary performance limitation in many applications.

The latest generation Sunway supercomputer has evidenced its superior computational capabilities in the HPC+AI domain. Recently, researchers from our High-Performance-Computing department have proposed a high-performance 3D convolution algorithm on the latest Sunway processor. In this work, we design a three-level blocking algorithms for the 3D convolution on SW260101Pro (Fig. 1), and a novel RMA scatter-communication scheme (Fig. 2) and DMA memory access optimization are proposed to fully exploit on-chip network bandwidth. In addition, further pipeline optimizations to improve the execution efficiency and overlap the memory access latencies are conducted to improve performance. Experimental results show that our 3D convolution implementation achieves 2.54x speedup on average compared to the im2col+sgemm implementation based on the xMath2.0 library, and can achieve up to 2.12 Tflop/s of single precision, 92% of the theoretical peak performance.

Fig 1. Three-level blocking of 3D convolution

Fig 2. RMA scatter-communication scheme

Li Jialin, a PhD student in High-Performance Department, support by Prof. ZHANG Jian, proposed a high-performance 3D convolution algorithm on the latest Sunway processor. The research paper was accepted by the 53rd International Conference on Parallel Processing.

This work is presented in the International Conference on Parallel Processing (ICPP) (CCF B). The first author of the paper is Li Jialin, a PhD student, supervised by Prof. Zhang Jian, and the corresponding author is Prof. Zhang Jian. This work is supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (grant number XDB0500101).

ref

Li J, Feng Z, Gao Y, et al. High-Performance 3D convolution on the Latest Generation Sunway Processor[C]//Proceedings of the 53rd International Conference on Parallel Processing. 2024: 241-251.



Appendix: