Abstract:Now vehicle location technology is more and more widely used in people\'s lives. 3D object detection methods based on LiDAR sensors have achieved success in detection accuracy, but LiDAR sensors are too expensive to be widely used. The methods using images for 3D object detection reduce the cost, but often have poor performance because of the lack of depth information. In this paper, we propose a 3D object detection framework named Depth-Guided and Depth-Aware (DGDA) which is able to simultaneously utilize perspective information of RGB images and depth information of depth maps for 3D detection. Experiments on the KITTI dataset show that DGDA outperforms the most existing image-based 3D object detection algorithms. It is worth mentioning that traditional image-based 3D object detection techniques are only used for the images captured from a driving perspective. In order to apply the 3D detection technology to the vehicle localization of the surveillance video of the bus station, we also propose an angle conversion localization algorithm and combine it with the DGDA framework to design an end-to-end vehicle location system for bus station.

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.


	1. 3D object detection-based vehicle localiazation system for bus stations
	Huang Xingbin,Wen Zhigang
	Computer Science and Technology 03 March 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Now vehicle location technology is more and more widely used in people\'s lives. 3D object detection methods based on LiDAR sensors have achieved success in detection accuracy, but LiDAR sensors are too expensive to be widely used. The methods using images for 3D object detection reduce the cost, but often have poor performance because of the lack of depth information. In this paper, we propose a 3D object detection framework named Depth-Guided and Depth-Aware (DGDA) which is able to simultaneously utilize perspective information of RGB images and depth information of depth maps for 3D detection. Experiments on the KITTI dataset show that DGDA outperforms the most existing image-based 3D object detection algorithms. It is worth mentioning that traditional image-based 3D object detection techniques are only used for the images captured from a driving perspective. In order to apply the 3D detection technology to the vehicle localization of the surveillance video of the bus station, we also propose an angle conversion localization algorithm and combine it with the DGDA framework to design an end-to-end vehicle location system for bus station.
	TO cite this article:Huang Xingbin,Wen Zhigang. 3D object detection-based vehicle localiazation system for bus stations[OL].[ 3 March 2022] http://en.paper.edu.cn/en_releasepaper/content/4756382


	2. Multi-scale convolutional recurrent neural network for monaural speech enhancement
	Shibo Wei,Ting Jiang
	Computer Science and Technology 14 February 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Convolutional recurrent neural network (CRN) has achieved much success in the field of speech enhancement. The common processing method is to first use the convolution layer to downsample and compress the speech, and then use LSTM to conduct sequence modeling for the compressed features, to integrate the global information of the speech sequence. Finally, upsampling recovery is performed by the deconvolution layer. However, downsampling may cause the loss of speech feature information, so this paper proposes a multi-scale convolutional recurrent network model (MS-CRN).In this model, LSTM with residual connection is used for sequential modeling of the down-sampled results of each layer, and the results are splicing with the output of the deconvolution layer as the input of the next deconvolution layer. The structure combines the sequence features of different scales to further utilize the global information of the speech. Experimental results show that, under the same conditions, the SMS-CRN can obtain higher scores in scale-invariant signal-to-noise ratio (SI-SDR), etc than the original CRN.
	TO cite this article:Shibo Wei,Ting Jiang. Multi-scale convolutional recurrent neural network for monaural speech enhancement[OL].[14 February 2022] http://en.paper.edu.cn/en_releasepaper/content/4756218

Saved Papers

Saved Papers


	3. Research on Multi-Object Tracking Algorithm Based on Deep Feature
	SUN Qing-Hong, DONG Yuan
	Computer Science and Technology 23 January 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:This paper proposes a practical multiple object tracking system based on DeepSort (Simple Online and Realtime Tracking with a Deep Association Metric). This system uses the tracking-by-detection method as the framework, which divides the optimization processes into two parts: detection and tracking. As for the overall performance of the algorithm, detection quality is a key element for it. According to the comparison in terms of speed and performance, suitable detectors are SDP-CRC and Yolov3 in multi-object tracking scenes. Additionally, this project has proposed a new pre-processing method based on NMS (non maximum suppression). This method has improved tracking performance by up to 3.5\% in terms of MOTA. As for tracking components, this project replaces a new estimation method based on visual object trackers (SiamRPN) with traditional Kalman motion estimation method. The experimental results show that this method has improved MOTA (Multiple Object Tracking Accuracy) by 0.3\%. Besides, the speed performance has decreased nearly 500\%. That’s because this experiment should be carried out more finely. In the next step, the performance of this method can be improved by updating the template frame appropriately, using appearance information provided by SiamRPN tracker for matching process and improving the robustness of the tracker. Additionally, this project has optimized detection-to-tracker association algorithm by using their positional and velocity relationship. It has reduced the complexity of the algorithm by avoiding multi-dimensional matrix operations.
	TO cite this article:SUN Qing-Hong, DONG Yuan. Research on Multi-Object Tracking Algorithm Based on Deep Feature[OL].[23 January 2022] http://en.paper.edu.cn/en_releasepaper/content/4756184


	4. Ship Detection with Optical Image Based on Attention and Loss Improved YOLO v3
	Yao Chen,Yuanyuan Qiao
	Computer Science and Technology 20 January 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Object detection is a critical research topic in computer vision, and its research results have been widely used in recent years. As a subtask of object detection, Ship detection has important research significance. Most ship detection research is based on SAR (Synthetic Aperture Radar) images in the existing research. However, the imaging method of SAR image is different from that of optical image, and it is impossible to transfer the research results of SAR image to optical image. Compared with SAR images, optical images have more image feature information, which can assist the algorithm to better learn ship features. In addition, the research of optical image ship detection has more important commercial value. Companies only need to equip a simple optical camera to complete the ship detection, without the need for a valuable device like radar, which is more reusable. This paper conducts optical image ship detection experiments, applies the YOLO v3 algorithm to the ship detection task, introduces the attention mechanism to transform the residual block in the DarkNet-53 network, and achieves more excellent performance. At the same time,this paper optimizes the recently proposed CIoU loss function which is better than the ln-norm loss function and presents the AIoU loss function on this basis. By increasing the area penalty term, the AIoU loss improves performance while converging speed compared with CIoU loss. It is applied to ship detection in the optical image and compared with GIoU, DIoU, and CIoU loss functions, and it has achieved better results than them in the bounding box regression of ship detection.
	TO cite this article:Yao Chen,Yuanyuan Qiao. Ship Detection with Optical Image Based on Attention and Loss Improved YOLO v3[OL].[20 January 2022] http://en.paper.edu.cn/en_releasepaper/content/4756133


	5. FGSD: A Dataset For Fine-Grained Ship detecion In High Resolution Satellite Images
	Kaiyan Chen,Kaiyan Chen,Ming Wu,Jiaming Liu,Chuang Zhang
	Computer Science and Technology 19 March 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Ship detection using high-resolution remote sensing images is an important task, which contribute to sea surface regulation. The complex background and special visual angle make ship detection relies in high quality datasets to a certain extent. However, there is few works on giving both precise classification and accurate location of ships in existing ship detection datasets. To further promote the research of ship detection, we introduced a new fine-grained ship detection datasets, which is named as FGSD. The dataset collects high-resolution remote sensing images that containing ship samples from multiple large ports around the world. Ship samples were fine categorized and annotated with both horizontal and rotating bounding boxes. To further detailed the information of the dataset, we put forward a new representation method of ships’ orientation. For future research, the dock as a new class was annotated in the dataset. Besides, rich information of images were provided in FGSD, including the source port, resolution and corresponding GoogleEarth's resolution level of each image. As far as we know, FGSD is the most comprehensive ship detection dataset currently and it'll be available soon. Some baselines for FGSD are also provided in this paper.
	TO cite this article:Kaiyan Chen,Kaiyan Chen,Ming Wu, et al. FGSD: A Dataset For Fine-Grained Ship detecion In High Resolution Satellite Images[OL].[19 March 2021] http://en.paper.edu.cn/en_releasepaper/content/4754102


	6. HCADecoder: A Hybrid CTC-Attention Decoder for Chinese Text Recgnition
	CAI Si-Qi,XUE Wen-Yuan,LI Qing-Yong
	Computer Science and Technology 17 March 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Text recognition has attracted much attention and achieved exciting results on several commonly used public English datasets in recent years. However, most of these well-established methods, such as connectionist temporal classification (CTC)-based methods and attention-based methods, pay less attention to challenges on the Chinese scene, especially for long text sequences. In this paper, we exploit the characteristic of Chinese word frequency distribution and propose a hybrid CTC-Attention decoder (HCADecoder) supervised with bigram mixture labels for Chinese text recognition. Specifically, we first add high-frequency bigram subwords into the original unigram labels to construct the mixture bigram label, which can shorten the decoding length. Then, in the decoding stage, the CTC module outputs a preliminary result, in which confused predictions are replaced with bigram subwords. The attention module utilizes the preliminary result and outputs the final result. Experimental results on four Chinese datasets demonstrate the effectiveness of the proposed method for Chinese text recognition, especially for long texts. Code will be made publicly available.
	TO cite this article:CAI Si-Qi,XUE Wen-Yuan,LI Qing-Yong. HCADecoder: A Hybrid CTC-Attention Decoder for Chinese Text Recgnition[OL].[17 March 2021] http://en.paper.edu.cn/en_releasepaper/content/4753921


	7. Spatially Face Manipulation on Autoencoder Space
	Sun Jiangyue,Deng Weihong
	Computer Science and Technology 23 January 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:While the quality of autoencoder image reconstruction and the disentanglement of autoencoder image representation have improved tremendously in recent years, the ability to manipulate the output image by controlling the latent space which represents images is still limited. Manipulation on the specific region of an image is also lack of study.This paper presents two novel face editing strategies that allow changing the semantic information of any arbitrary regions of images by manipulating the spatially disentangled representations of face images. One presents a new normalization, adaptive region normalization (AdaRN) to allow representation collaging, the other shows that the principal components computed by patch Principal Components Analysis (patch PCA) has meaningful information. The principal components allow to edit the specific region of image and control its semantic information. It was based on a well-trained autoencoder network called swapping autoencoder proposed recently.The two strategies can edit face images over an arbitrary region using weak supervision on a well-trained model. Experiments on FFHQ dataset show that any arbitrary regions such as mouth, eyes and eyebrows can be edited naturally using our strategies. Extensive results on the FFHQ dataset suggest that our strategy can not only edit face images flexibly but also require less effort for image labeling and model training tasks.
	TO cite this article:Sun Jiangyue,Deng Weihong. Spatially Face Manipulation on Autoencoder Space[OL].[23 January 2021] http://en.paper.edu.cn/en_releasepaper/content/4753406


	8. An improved Faster R-CNN network for aeroengine fuse fracture detection
	Liao Minjie,Bo Lin,Wu Xialing,Liu Qunyang,Wu Wenhong
	Computer Science and Technology 13 December 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:In order to meet the needs of aeroengine fuse fracture detection in practical application, an improved Faster R-CNN small target detection network is proposed. Firstly, FPN feature graph pyramid is added to improve the extraction ability of small target features, and then ROI Align is used to replace ROI pooling to reduce the loss of feature information of small targets. Experiments on the fuse fracture data set show that the improved detection network is 5.76% higher than Faster R-CNN on mAP. The experimental results show that the improved network is more advanced and has a practical application prospect in aeroengine fuse fracture detection based on computer vision.
	TO cite this article:Liao Minjie,Bo Lin,Wu Xialing, et al. An improved Faster R-CNN network for aeroengine fuse fracture detection[OL].[13 December 2020] http://en.paper.edu.cn/en_releasepaper/content/4753217


	9. Domain adaptive image retrieval based on region of interest
	Zhao Zhen,Ai Xinbo
	Computer Science and Technology 03 March 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Recently, the explosive growth of image data, how to retrieve effective images has become an urgent problem. However, image retrieval often faces the following problems.In the current image retrieval model, the information of local area of interest is less considered. When images exist in two different domain distributions, cross-domain retrieval cannot be performed effectively.In view of the current existence of the above problems, this paper put forward based on the interested region of domain adaptive image retrieval methods, including the interest of the target detection technology of image area, the interference of background information filter is invalid, feature fusion method for multi-objective regional characteristics of effective at the same time to join the different domain image domain structure, realization of cross-domain retrieval.In this paper, we evaluated the effectiveness of our method on the PASCAL VOC dataset.
	TO cite this article:Zhao Zhen,Ai Xinbo. Domain adaptive image retrieval based on region of interest[OL].[ 3 March 2020] http://en.paper.edu.cn/en_releasepaper/content/4750996


	10. Sparse SPMAE: An Effective Method for Pose-Invariant Face Recognition
	WANG Shaoying,FAN Chunxiao,MING Yue
	Computer Science and Technology 02 February 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Pose-Invariant Face Recognition (PIFR) is a challenging problem since it is difficult to learn geometrically invariant feature representation for large pose face images. On the one hand, the pose variations of faces is a highly non-linear process. On the other hand, pose variations will lead to the loss of a large proportion of discriminative face information. And in the case of insufficient training data, there are risks of over fitting in the complex models. In order to solve these problems, this paper proposes a pose-invariant face recognition method based on sparse multiple auto-encoders, named as Sparse Stacked Progressive Multiple Auto-Encoders (Sparse SPMAE): (1) The non-linear transform process from the pose face images to frontal ones is divided into several nearly linear process. By stacking multiple auto-encoders with adaptive hidden layers, the face normalization is implemented step by step in a reverse and progressive manner. (2) It introduces the sparse constraints in multiple auto-encoders to learn the effectively discriminative information and reduce the potentials of over fitting in the Sparse SPMAE models. As a result, Sparse SPMAE obtains good accuracy on three classic datasets.
	TO cite this article:WANG Shaoying,FAN Chunxiao,MING Yue. Sparse SPMAE: An Effective Method for Pose-Invariant Face Recognition[OL].[ 2 February 2020] http://en.paper.edu.cn/en_releasepaper/content/4750536

	Check out RSS, or use RSS reader to subscribe this item