Home > Papers

 
 
An Efficient Method for Optimizing PETSc on The Sunway TaihuLight System
Letian Kang 1,Zhi-Jie Wang 2,Zhe Quan 1,Weigang Wu 3,Song Guo 4,Kenli Li 1 *,Keqin Li 5
1.College of Information Science and Engineering, Hunan University, Changsha 410000
2.School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510000; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510000
3.School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510000
4.Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong 999077
5.Department of Computer Science, State University of New York, United States 10589
*Correspondence author
#Submitted by
Subject:
Funding: Key projects of National Natural Science Foundation(No.61432005, 91430214), The National Science Fund for Distinguished Young Scholars(No.61625202)
Opened online:14 May 2018
Accepted by: none
Citation: Letian Kang,Zhi-Jie Wang,Zhe Quan.An Efficient Method for Optimizing PETSc on The Sunway TaihuLight System[OL]. [14 May 2018] http://en.paper.edu.cn/en_releasepaper/content/4744816
 
 
High performance computingplatforms can bring us great benefits on processing various ubiquitous computing tasks. The Sunway TaihuLight supercomputer is a novel high performance computing platform, which is ranked No. 1 among the TOP500list in the world. In this paper, we focus on how to optimize the Portable and Extensible Toolkit for Scientificcomputation (PETSc), running on supercomputers. The main motivations for this study are twofold: (\romannumeral 1) PETSc is widely and frequently used in many scientific research fields such as biology, fusion, artificial intelligence, geosciences, etc; and (\romannumeral 2) the current nuclear PETSc does not fully utilize the potential of the Sunway TaighLight system, especially its powerful processor, i.e., SW26010 processor. To achieve high efficiency of PETSc, the central idea of our optimizations is to fully promote the performance of time-consuming and frequently used computation components (e.g., matrix and vector modules). To this end, we propose (\romannumeral 1) accelerating kernel codes with computing processing elements (CPEs), in which new compression format and targeted optimizations for vector and matrix operations are devised; and (\romannumeral 2) using more efficient memory access schemes. We have implemented our proposals and evaluated its effectiveness and efficiency through a real world application --- Structural Finite Element Analysis (SFEA). We obtain 16$\sim$32 times speedup for a single SW26010 processor. As an extra finding, the results also show a high scalability on over 8,000 computing nodes, i.e., 532,500 cores.
Keywords:High Performance Computing; PETSc; SW26010 processor; TanhuLight supercomputer}\cateidCHN{TP311
 
 
 

For this paper

  • PDF (0B)
  • ● Revision 0   
  • ● Print this paper
  • ● Recommend this paper to a friend
  • ● Add to my favorite list

    Saved Papers

    Please enter a name for this paper to be shown in your personalized Saved Papers list

Tags

Add yours

Related Papers

Statistics

PDF Downloaded 107
Bookmarked 0
Recommend 0
Comments Array
Submit your papers