Abstract:\justifying The A* algorithm is an important research direction in the field of artificial intelligence. At the same time, Graphics processing unit (GPU) is also continuously applied in various research fields. Therefore, this paper proposes a parallel A* search algorithm based on multi-GPU, exploring the implementation of A* search algorithm on multi-GPU architecture, so that it can be efficiently executed. Due to the influence of GPU on-chip memory and computing power, when the data scale reaches a certain scale, A* search based on single GPU will occur performance bottlenecks which seriously affects execution efficiency. Based on the heterogeneous memory structure of the multi-GPU architecture, this paper designs different partitioning methods for the two data sets, such as grid graphs and sliding puzzles, commonly used in A* search, and uses a multi-priority queue to improve GPU parallelism. The method adopted in this paper has achieved good results in the problem of 8-connected graphs and sliding puzzles. Through a series of comparative experiments, this paper verifies the effectiveness of the proposed method and is superior to the current A* search methods.

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.


	1. A Research of Parallel A* Search Method on Multi-GPU
	YAO Yapeng,Jianhua Sun,Jianhua Sun
	Computer Science and Technology 12 May 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:\justifying The A* algorithm is an important research direction in the field of artificial intelligence. At the same time, Graphics processing unit (GPU) is also continuously applied in various research fields. Therefore, this paper proposes a parallel A* search algorithm based on multi-GPU, exploring the implementation of A* search algorithm on multi-GPU architecture, so that it can be efficiently executed. Due to the influence of GPU on-chip memory and computing power, when the data scale reaches a certain scale, A* search based on single GPU will occur performance bottlenecks which seriously affects execution efficiency. Based on the heterogeneous memory structure of the multi-GPU architecture, this paper designs different partitioning methods for the two data sets, such as grid graphs and sliding puzzles, commonly used in A* search, and uses a multi-priority queue to improve GPU parallelism. The method adopted in this paper has achieved good results in the problem of 8-connected graphs and sliding puzzles. Through a series of comparative experiments, this paper verifies the effectiveness of the proposed method and is superior to the current A* search methods.
	TO cite this article:YAO Yapeng,Jianhua Sun,Jianhua Sun. A Research of Parallel A* Search Method on Multi-GPU[OL].[12 May 2020] http://en.paper.edu.cn/en_releasepaper/content/4752062


	2. Address Randomization for Dynamic Memory Allocators on the GPU
	SUN Jianhua,PENG Can
	Computer Science and Technology 04 April 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:GPUs have been widely used in multi-user environments such as the cloud due to its rich thread-level parallelism. In such an environment, multiple kernels can execute in parallel. However, it is possible that a kernel leverages buffer overflow to attack other kernels on the same GPU. The limited existing work focuses on the detection of buffer overflows instead of prevention. Address randomization is an effective approach to preventing memory-related attacks on the CPU. However, current GPUs lack similar support to defend the increasing threats of memory overflow. In this paper, we propose an address randomization method for dynamic memory allocation on the GPU. We have implemented and compared different pseudo-random algorithms on the GPU, and integrated the address randomization into an existing allocator. Elaborate discussions are presented to analyze the security of our proposed address randomization. Experimental evaluations show that the overhead incurred by our randomized algorithm is less than 20% on top of existing memory allocators.
	TO cite this article:SUN Jianhua,PENG Can. Address Randomization for Dynamic Memory Allocators on the GPU[OL].[ 4 April 2019] http://en.paper.edu.cn/en_releasepaper/content/4748245

Saved Papers

Saved Papers


	3. A Fast and Secure GPU Memory Allocator
	CHEN Hao,WU Jiang
	Computer Science and Technology 29 March 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Graphics Processing Units (GPUs) is widely used to perform general purpose computing in many areas such as scientific computing and deep learning. In order to offer more flexibility in GPU programming, dynamic memory allocation has been introduced in GPU programming frameworks suchas CUDA. However, the dynamic memory allocator in CUDA is inefficient in highly concurrent environments. Thus, several dynamic memory allocators are recently proposed to enhance the performance of dynamic memory management. However, these allocators only focus on achieving higher performance but ignore the security issues. In this paper, we propose a fast and secure GPU memory allocator based on ScatterAlloc. In order to efficiently protect against memory attacks such asbuffer overflows, our allocator consists of several key techniques including canary-based memory protection (two options such as detection-on-free and always-on-detection are provided), address compression, and over-provisioning. Experimental results show that the allocator can effectively detect buffer overflow errors while it is still approximately 100 times faster than the CUDA toolkit allocator.
	TO cite this article:CHEN Hao,WU Jiang. A Fast and Secure GPU Memory Allocator[OL].[29 March 2019] http://en.paper.edu.cn/en_releasepaper/content/4748201


	4. An Efficient Method for Optimizing PETSc on The Sunway TaihuLight System
	Letian Kang,Zhi-Jie Wang,Zhe Quan,Weigang Wu,Song Guo,Kenli Li,Keqin Li
	Computer Science and Technology 07 May 2018
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:High performance computingplatforms can bring us great benefits on processing various ubiquitous computing tasks. The Sunway TaihuLight supercomputer is a novel high performance computing platform, which is ranked No. 1 among the TOP500list in the world. In this paper, we focus on how to optimize the Portable and Extensible Toolkit for Scientificcomputation (PETSc), running on supercomputers. The main motivations for this study are twofold: (\romannumeral 1) PETSc is widely and frequently used in many scientific research fields such as biology, fusion, artificial intelligence, geosciences, etc; and (\romannumeral 2) the current nuclear PETSc does not fully utilize the potential of the Sunway TaighLight system, especially its powerful processor, i.e., SW26010 processor. To achieve high efficiency of PETSc, the central idea of our optimizations is to fully promote the performance of time-consuming and frequently used computation components (e.g., matrix and vector modules). To this end, we propose (\romannumeral 1) accelerating kernel codes with computing processing elements (CPEs), in which new compression format and targeted optimizations for vector and matrix operations are devised; and (\romannumeral 2) using more efficient memory access schemes. We have implemented our proposals and evaluated its effectiveness and efficiency through a real world application --- Structural Finite Element Analysis (SFEA). We obtain 16$\sim$32 times speedup for a single SW26010 processor. As an extra finding, the results also show a high scalability on over 8,000 computing nodes, i.e., 532,500 cores.
	TO cite this article:Letian Kang,Zhi-Jie Wang,Zhe Quan, et al. An Efficient Method for Optimizing PETSc on The Sunway TaihuLight System[OL].[ 7 May 2018] http://en.paper.edu.cn/en_releasepaper/content/4744816


	5. Parallelizing the Count-Min Sketch Algorithm on the Multi-core Processors
	ZHANG Yu
	Computer Science and Technology 14 December 2015
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:In high-speed network monitoring, the ever-growing traffic calls for a high-performance solution for the computation of items' frequencies. The increasing number of cores in current commodity multi-core processors opens up new opportunities in parallelization. In this paper, we present a novel method that exploits the great parallel capability of multi-cores to speed up the famous Count-Min sketch algorithm. The proposed parallel Count-Min sketch algorithm equally distributes the input data stream into sub-threads which use the original Count-Min sketch algorithm to process the sub-streams. The counters in each local Count-Min sketch with frequency increments exceeding a pre-defined threshold are sent to a merging thread which is able to return the estimated frequencies satisfying the (epsilon, delta)-approximation requirement. The theoretical correctness and complexity analyses are presented. Experiments with real traffic traces confirm the theoretical analyses and demonstrate the excellent performance as well as the effects of parameters. The results show that the proposed parallel Count-Min sketch algorithm achieves near-linear speedup at the cost of greater memory use.
	TO cite this article:ZHANG Yu. Parallelizing the Count-Min Sketch Algorithm on the Multi-core Processors[OL].[14 December 2015] http://en.paper.edu.cn/en_releasepaper/content/4668107


	6. Rough Set based K-Modes Clustering Algorithm with Hadoop Cloud Platform
	ZHANG Lisheng,ZHANG Jiong,LEI Dajiang
	Computer Science and Technology 21 April 2015
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:In this paper, in order to solve the problems that the traditional K-Modes clustering algorithm cannot efficiently handle massive amounts of data and cannot accurately calculating the dissimilarity between data objects attributes. Based on rough sets and cloud computing, proposed K-Modes clustering algorithm based MapReduce programming model and rough sets. Firstly, using rough set model to recalculate dissimilarity between data object attributes for improving the accuracy of the calculation of distances, then combine the advantages of Hadoop platform and MapReduce programming model, will be parallelized to achieve K-Modes algorithm based on rough sets. Through the experiment, when clustering high dimensional massive data, the improved algorithm reduces the computer time and get an effective clustering results. Experiments show that the proposed algorithm has better stability and scalability.
	TO cite this article:ZHANG Lisheng,ZHANG Jiong,LEI Dajiang. Rough Set based K-Modes Clustering Algorithm with Hadoop Cloud Platform[OL].[21 April 2015] http://en.paper.edu.cn/en_releasepaper/content/4639491


	7. A GPU-based Native Bayesian algorithm for document classification
	Yang Chengpeng,Gao Zhanchun,Jiang Yanjun
	Computer Science and Technology 19 November 2013
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Document classifying is one of the most important tasks in text mining. In classifying algorithms, high-dimensional vector is usually used to represent all features or all features' weight information which causes that the algorithms are often computationally expensive. This paper implements a GPU-based Bayesian algorithm with NVIDIA's Compute Unified Device Architecture (CUDA) and uses it for document classifying. In this experiment, the GPU-based program can acquire a maximum 50 times speedup in contrast with CPU-based program.
	TO cite this article:Yang Chengpeng,Gao Zhanchun,Jiang Yanjun. A GPU-based Native Bayesian algorithm for document classification[OL].[19 November 2013] http://en.paper.edu.cn/en_releasepaper/content/4570429


	8. Metadata-intensive I/O Optimizations in Parallel File Systems
	Xie Ke,Li Xiuqiao,Wu Qimeng,Xiao Limin,Ruan Li
	Computer Science and Technology 26 April 2013
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:With parallel file systems increasingly growing in size, the performance of metadata I/O becomes critical for overall performance. Metadata-intensive applications create a lot of metadata I/O requests with small amount of data, making metadata access become the bottleneck of system. We propose an optimization method based on aggregating and merging requests for metadata-intensive I/O to deal with this problem. Extensive simulations show that the aggregate throughput of intensive file creating can be increased by up to 15.28 times and average response time can be decreased by factors of up to 99.27 percent when the aggregation period and request interval is configured as 0.8ms and 0.025ms respectively. Simulations also show that the aggregate throughput of intensive metadata access can be increased by up to 8.45 times and average response time can be decreased by factors of up to 99.02 percent when the merging period and request interval is configured as 0.4ms and 0.025ms respectively. Meanwhile, experiments show that our method can scale well with the number of metadata servers and clients.
	TO cite this article:Xie Ke,Li Xiuqiao,Wu Qimeng, et al. Metadata-intensive I/O Optimizations in Parallel File Systems[OL].[26 April 2013] http://en.paper.edu.cn/en_releasepaper/content/4539533


	9. Enumerate Strongly Connected Components of Large-scale Graph with MapReduce
	Lu Lv,Lei Xie
	Computer Science and Technology 25 December 2012
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Enumerating strongly connected components in directed graph is the fundamental problem of graph theory. The standard serial algorithm for strongly connected components is based on depth first search, which is difficult to parallelize for large scale graph. In this paper, we propose a nearly linear parallel bi-directional label propagation algorithm to enumerate strongly connected components of large scale graph on MapReduce framework. The algorithm is suitable for large scale graphs and the experiment shows its efficiency and scalability.
	TO cite this article:Lu Lv,Lei Xie. Enumerate Strongly Connected Components of Large-scale Graph with MapReduce[OL].[25 December 2012] http://en.paper.edu.cn/en_releasepaper/content/4502357


	10. MPI and OpenMP Paradigms and its application of Solving Large Scale Banded Linear Systems
	Lei Xu,Hanyuan Zheng,Zhixiang Liu,Weibing Feng,Wu Zhang
	Computer Science and Technology 06 January 2012
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:This paper discusses the performance of MPI+OpenMP hybrid programming paradigm and different implementations. We design a multi-granularity parallel algorithm for solving larger scale B anded linear systems, and compare its performance with pure MPI algorithm on the high performance computer of Shanghai University. The results indicate that the hybrid algorithm has better scalability and speedup.
	TO cite this article:Lei Xu,Hanyuan Zheng,Zhixiang Liu, et al. MPI and OpenMP Paradigms and its application of Solving Large Scale Banded Linear Systems[OL].[ 6 January 2012] http://en.paper.edu.cn/en_releasepaper/content/4460077

	Check out RSS, or use RSS reader to subscribe this item