Abstract:\justifying The A* algorithm is an important research direction in the field of artificial intelligence. At the same time, Graphics processing unit (GPU) is also continuously applied in various research fields. Therefore, this paper proposes a parallel A* search algorithm based on multi-GPU, exploring the implementation of A* search algorithm on multi-GPU architecture, so that it can be efficiently executed. Due to the influence of GPU on-chip memory and computing power, when the data scale reaches a certain scale, A* search based on single GPU will occur performance bottlenecks which seriously affects execution efficiency. Based on the heterogeneous memory structure of the multi-GPU architecture, this paper designs different partitioning methods for the two data sets, such as grid graphs and sliding puzzles, commonly used in A* search, and uses a multi-priority queue to improve GPU parallelism. The method adopted in this paper has achieved good results in the problem of 8-connected graphs and sliding puzzles. Through a series of comparative experiments, this paper verifies the effectiveness of the proposed method and is superior to the current A* search methods.

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.


	1. A Research of Parallel A* Search Method on Multi-GPU
	YAO Yapeng,Jianhua Sun,Jianhua Sun
	Computer Science and Technology 12 May 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:\justifying The A* algorithm is an important research direction in the field of artificial intelligence. At the same time, Graphics processing unit (GPU) is also continuously applied in various research fields. Therefore, this paper proposes a parallel A* search algorithm based on multi-GPU, exploring the implementation of A* search algorithm on multi-GPU architecture, so that it can be efficiently executed. Due to the influence of GPU on-chip memory and computing power, when the data scale reaches a certain scale, A* search based on single GPU will occur performance bottlenecks which seriously affects execution efficiency. Based on the heterogeneous memory structure of the multi-GPU architecture, this paper designs different partitioning methods for the two data sets, such as grid graphs and sliding puzzles, commonly used in A* search, and uses a multi-priority queue to improve GPU parallelism. The method adopted in this paper has achieved good results in the problem of 8-connected graphs and sliding puzzles. Through a series of comparative experiments, this paper verifies the effectiveness of the proposed method and is superior to the current A* search methods.
	TO cite this article:YAO Yapeng,Jianhua Sun,Jianhua Sun. A Research of Parallel A* Search Method on Multi-GPU[OL].[12 May 2020] http://en.paper.edu.cn/en_releasepaper/content/4752062


	2. Address Randomization for Dynamic Memory Allocators on the GPU
	SUN Jianhua,PENG Can
	Computer Science and Technology 04 April 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:GPUs have been widely used in multi-user environments such as the cloud due to its rich thread-level parallelism. In such an environment, multiple kernels can execute in parallel. However, it is possible that a kernel leverages buffer overflow to attack other kernels on the same GPU. The limited existing work focuses on the detection of buffer overflows instead of prevention. Address randomization is an effective approach to preventing memory-related attacks on the CPU. However, current GPUs lack similar support to defend the increasing threats of memory overflow. In this paper, we propose an address randomization method for dynamic memory allocation on the GPU. We have implemented and compared different pseudo-random algorithms on the GPU, and integrated the address randomization into an existing allocator. Elaborate discussions are presented to analyze the security of our proposed address randomization. Experimental evaluations show that the overhead incurred by our randomized algorithm is less than 20% on top of existing memory allocators.
	TO cite this article:SUN Jianhua,PENG Can. Address Randomization for Dynamic Memory Allocators on the GPU[OL].[ 4 April 2019] http://en.paper.edu.cn/en_releasepaper/content/4748245

Saved Papers

Saved Papers


	3. A Fast and Secure GPU Memory Allocator
	CHEN Hao,WU Jiang
	Computer Science and Technology 29 March 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Graphics Processing Units (GPUs) is widely used to perform general purpose computing in many areas such as scientific computing and deep learning. In order to offer more flexibility in GPU programming, dynamic memory allocation has been introduced in GPU programming frameworks suchas CUDA. However, the dynamic memory allocator in CUDA is inefficient in highly concurrent environments. Thus, several dynamic memory allocators are recently proposed to enhance the performance of dynamic memory management. However, these allocators only focus on achieving higher performance but ignore the security issues. In this paper, we propose a fast and secure GPU memory allocator based on ScatterAlloc. In order to efficiently protect against memory attacks such asbuffer overflows, our allocator consists of several key techniques including canary-based memory protection (two options such as detection-on-free and always-on-detection are provided), address compression, and over-provisioning. Experimental results show that the allocator can effectively detect buffer overflow errors while it is still approximately 100 times faster than the CUDA toolkit allocator.
	TO cite this article:CHEN Hao,WU Jiang. A Fast and Secure GPU Memory Allocator[OL].[29 March 2019] http://en.paper.edu.cn/en_releasepaper/content/4748201


	4. An uncoupled and lightweight Virtualized Network Function Controller
	LI Kai,Wang Chun
	Computer Science and Technology 24 January 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:NFV (Network Function Virtualization) is a powerful emerging technique with widespread applicability. In this paper, we propose a creative solution to decouple the strong coupling relationship between VNF (Virtualized Network Function) and VNFM (Virtualized Network Function Manager) in the current NFV practice. Our solution is to design a VNFC (Virtualized Network Function Controller). It can quickly automate the deployment of VNF into the Docker container and can use the improved compression algorithm to achieve fast inter-node migration of VNF. We have experimentally proved that Docker\'s performance can be improved by 3 to 4 times compared with VM and the improved compression algorithm can effectively improve the compression ratio.
	TO cite this article:LI Kai,Wang Chun. An uncoupled and lightweight Virtualized Network Function Controller[OL].[24 January 2019] http://en.paper.edu.cn/en_releasepaper/content/4747144


	5. Research and design based on path switching strategy of extended LDP protocol
	YU Zhiji,ZHANG Haiyang
	Computer Science and Technology 09 January 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Label diversion protocol (LDP) is a signal controlling protocol based on the multiple protocol label exchanging system. How to perform path switching is a key issue in the extended LDP protocol(includes p2mp-ldp and mp2mp-ldp). In the case of route flapping, Path switching may cause unnecessary waste of resources. The resource loss rate can be used as an important indicator in many network protocols.Therefore, this paper firstly uses the extended LDP protocol to perform path switching scenarios on route flapping, then abstracts the path switching strategy into a generic modell, exploring the resource loss rate caused by the path switching strategy on the general model.Then,A delay switching strategy is proposed on the general model to prove the feasibility of the delay switching strategy by reducing the resource loss rate. At last, this paper compare normal switching and delayed switching strategies, prove the practicability of the technology through the application on the quagga platform. The delay switching strategy presented in this paper is not only applicable to the LDP protocol, but also to other complex network protocols.
	TO cite this article:YU Zhiji,ZHANG Haiyang. Research and design based on path switching strategy of extended LDP protocol[OL].[ 9 January 2019] http://en.paper.edu.cn/en_releasepaper/content/4746973


	6. MPLS Multicast tree label retention policy based on derivation state search
	YAN Yuan,MA Yue
	Computer Science and Technology 07 January 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:This paper studies the key techniques and problems of the label retention strategy of multi-point-to-multipoint multi-point (MP2MP) multicast tree based on MPLS, puts forward a multicast tree label retention strategy based on the derivation state search, and gives the label retention strategy based on MPLS MP2MP multicast tree. The single-broadcast LDP module in Quagga is extended to support the proposed MP2MP label retention strategy. Performance analysis and simulation results show that compared with the traditional MP2MP conservative label retention mode, the logically derived MPLS MP2MP label retention strategy can cut off traffic in time and have great convergence performance when the link converges.
	TO cite this article:YAN Yuan,MA Yue. MPLS Multicast tree label retention policy based on derivation state search[OL].[ 7 January 2019] http://en.paper.edu.cn/en_releasepaper/content/4746966


	7. An Efficient Method for Optimizing PETSc on The Sunway TaihuLight System
	Letian Kang,Zhi-Jie Wang,Zhe Quan,Weigang Wu,Song Guo,Kenli Li,Keqin Li
	Computer Science and Technology 07 May 2018
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:High performance computingplatforms can bring us great benefits on processing various ubiquitous computing tasks. The Sunway TaihuLight supercomputer is a novel high performance computing platform, which is ranked No. 1 among the TOP500list in the world. In this paper, we focus on how to optimize the Portable and Extensible Toolkit for Scientificcomputation (PETSc), running on supercomputers. The main motivations for this study are twofold: (\romannumeral 1) PETSc is widely and frequently used in many scientific research fields such as biology, fusion, artificial intelligence, geosciences, etc; and (\romannumeral 2) the current nuclear PETSc does not fully utilize the potential of the Sunway TaighLight system, especially its powerful processor, i.e., SW26010 processor. To achieve high efficiency of PETSc, the central idea of our optimizations is to fully promote the performance of time-consuming and frequently used computation components (e.g., matrix and vector modules). To this end, we propose (\romannumeral 1) accelerating kernel codes with computing processing elements (CPEs), in which new compression format and targeted optimizations for vector and matrix operations are devised; and (\romannumeral 2) using more efficient memory access schemes. We have implemented our proposals and evaluated its effectiveness and efficiency through a real world application --- Structural Finite Element Analysis (SFEA). We obtain 16$\sim$32 times speedup for a single SW26010 processor. As an extra finding, the results also show a high scalability on over 8,000 computing nodes, i.e., 532,500 cores.
	TO cite this article:Letian Kang,Zhi-Jie Wang,Zhe Quan, et al. An Efficient Method for Optimizing PETSc on The Sunway TaihuLight System[OL].[ 7 May 2018] http://en.paper.edu.cn/en_releasepaper/content/4744816


	8. Measurement Study of Today's Immersive Mobile VR Systems
	SUN Linhui
	Computer Science and Technology 21 December 2017
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Due to better mobility and lower prices, plenty of VR applications have been launched on mobile VR systems. Meanwhile, people still do not have a clear picture of mobile VR system\'s performance. To tackle this problem, this paper presents experiments to discover mobile VR system\'s performance and overhead. The experimental results disclose a fact that today\'s commodity smartphone cannot provide comfortable immersive VR experience because of its restricted hardware resources. On the basis of experiments, this paper also discusses major challenges and future directions for mobile VR\'s further optimizations.
	TO cite this article:SUN Linhui. Measurement Study of Today's Immersive Mobile VR Systems[OL].[21 December 2017] http://en.paper.edu.cn/en_releasepaper/content/4742761


	9. Dynamic Granule Substitution and Granule Tree in GranuleJS
	ZENG Qing-Hua, WU Wen-Bin, ZHAO Yin-Liang, SUN Li-Yu
	Computer Science and Technology 18 December 2017
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:\renewcommand{\raggedright}{\leftskip=0pt \rightskip=0pt plus 0cm}\raggedrightModern software systems are usually running at various external environments, and are required to automatically maintain their behaviors to adapt the changed surrounding environments. To deal with the problem, Granule oriented programming was proposed and GranuleJ built top of Java was designed and implemented by language extensions. However, such static language infrastructure implementation often has an expensive cost and a petty high performance overhead in granule substitution, where granules as compositional code units are to be composed at runtime. In this paper, we propose an extension language called GranuleJS based on dynamic language JavaScript by fully leveraging dynamic nature of JavaScript's behaviors. We design granule tree for granule substation in two optimized ways, lazy chain approach and marking approach respectively. Moreover, since the dynamic feature of granule tree can cause the dissatisfaction of the invariance of context variable string path, we designed a fast similar granule search algorithm. The performance evaluation has shown that GranuleJS have fairly low performance overhead.
	TO cite this article:ZENG Qing-Hua, WU Wen-Bin, ZHAO Yin-Liang, et al. Dynamic Granule Substitution and Granule Tree in GranuleJS[OL].[18 December 2017] http://en.paper.edu.cn/en_releasepaper/content/4742873


	10. A Fitness Test Programming Model for Dynamic Evolution In Context-aware Applications
	ZENG Qing-Hua, ZHAO Yin-Liang, SUN Li-Yu, WU Wen-Bin
	Computer Science and Technology 18 December 2017
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:\renewcommand{\raggedright}{\leftskip=0pt \rightskip=0pt plus 0cm}\raggedright Nowadays, modern context aware applications are increasingly expected to adapt their behaviors to the surrounding environmental changes autonomously to meet their desired needs at run-time, especially in the fields of pervasive and ubiquitous computing. However, since unforeseen context variations mostly arise at runtime, which are unknown to the developers, applications have to suffer from dynamic changes to hard-to-predict behaviors which cannot be explicitly specified after initially deployed. Most existing context-based languages can only provide anticipated adaptation which is usually predefined at the initial design time, resulting in the program is running on the wrong context. In this work, we propose a fitness test programming model to address the problem. The model enables detect the unfitness between program behavior and its related context, and then automatically adjust program behavior to adapt the current context, making the program perform dynamic evolution. To validate the feasibility and effectiveness of the model, we have developed a case study and run it in the GranuleJ language framework successfully.%From the perspective of language-level, language extension is an efficient and prompt approach to conduct those adaptable applications. However, the existing context-based languages can only provide anticipated adaptation which is usually predefined at the initial design time, and they also lack appropriate programming language abstractions of dynamic flavor to support context uncertainty at runtime.}%In this paper, we present a novel fitness test programming model, which enables implicit context checks to be aware of the adaptation of the program and carry out program evolution when the program is no longer satisfied with the current context. GranuleJ introduces \emph{context variable} to identify context changes clearly, \emph{fitness tests} to detect the adaptation points where unsuitable behaviors in the program happen relying on context variables and \emph{granules} that modularize behavior variations as reuse building blocks to be freely assembled or disassembled at runtime. We have already implemented the language framework of GranuleJ and validated the feasibility and effectiveness of it through case study and performance evaluation.
	TO cite this article:ZENG Qing-Hua, ZHAO Yin-Liang, SUN Li-Yu, et al. A Fitness Test Programming Model for Dynamic Evolution In Context-aware Applications[OL].[18 December 2017] http://en.paper.edu.cn/en_releasepaper/content/4742869

	Check out RSS, or use RSS reader to subscribe this item