Abstract:Joint-Detection-and-Embedding paradigm achieves fast tracking by simultaneously learning detection and Re-ID features. However, it still faces performance degradation in complex scenes and the misalignment between detection and Re-ID features. In this paper, we propose a decoupling module based on channel-wise attention mechanism to obtain task-aligned features served for different demands of detection and Re-ID. To improve the performance of data association, we fuse motion, location, appearance information and perform a two-round matching for high and low confidence detections respectively by the Motion-GIoU matrix and the Embedding-GIoU matrix. Additionally, we apply the camera motion compensation to get a more accurate motion estimation, resulting in a more robust tracking in the scenes of camera motion and low-frame-rate. Extensive experiments show that our proposed method outperforms a wide range of existing methods on the MOTChallenge and HiEvE datasets.

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.


	1. Multi-Object Tracking with decoupled representations and unreliable detections in complex scenes
	XIAO Yuan, ZOU Qi
	Computer Science and Technology 21 March 2024
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Joint-Detection-and-Embedding paradigm achieves fast tracking by simultaneously learning detection and Re-ID features. However, it still faces performance degradation in complex scenes and the misalignment between detection and Re-ID features. In this paper, we propose a decoupling module based on channel-wise attention mechanism to obtain task-aligned features served for different demands of detection and Re-ID. To improve the performance of data association, we fuse motion, location, appearance information and perform a two-round matching for high and low confidence detections respectively by the Motion-GIoU matrix and the Embedding-GIoU matrix. Additionally, we apply the camera motion compensation to get a more accurate motion estimation, resulting in a more robust tracking in the scenes of camera motion and low-frame-rate. Extensive experiments show that our proposed method outperforms a wide range of existing methods on the MOTChallenge and HiEvE datasets.
	TO cite this article:XIAO Yuan, ZOU Qi. Multi-Object Tracking with decoupled representations and unreliable detections in complex scenes[OL].[21 March 2024] http://en.paper.edu.cn/en_releasepaper/content/4763002


	2. Research on the hippocampus medical imaging segmentation method for small samples
	QI Shu-Wen,Jiang Zhu-qing1,Jiang Zhu-qing1,Jiang Zhu-qing1
	Computer Science and Technology 16 March 2024
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:The hippocampus is located between the thalamus and the medial temporal lobe. It is mainly responsible for cognition, learning, and long and short memory. It is closely related to many diseases such as Alzheimer's disease and temporal lobe epilepsy. Therefore, the accurate segmentation of the hippocampal structure in magnetic resonance imaging is of great significance for the diagnosis of brain injury and brain disease prediction in clinical medicine. In recent years, the rapid development of deep learning technology has brought about brand-new changes to the field of hippocampal segmentation. Deep learning is data-driven, and the quantity and quality of data directly affect the accuracy of hippocampal segmentation. However, due to the difficulty of MR imaging acquisition and expensive manual annotation, hippocampus MR imaging is relatively scarce, which limits the performance improvement of deep learning models in hippocampal segmentation tasks to some extent. In order to overcome the challenges in small sample data scenarios and improve the accuracy of hippocampal segmentation, this paper proposes a data augmentation method, which aims to expand the data (brain magnetic resonance images) and label (hippocampus mask) simultaneously, so as to alleviate the problem of data scarcity and annotation scarcity. Through experiments, the proposed method can effectively improve the accuracy of hippocampal segmentation.
	TO cite this article:QI Shu-Wen,Jiang Zhu-qing1,Jiang Zhu-qing1, et al. Research on the hippocampus medical imaging segmentation method for small samples[OL].[16 March 2024] http://en.paper.edu.cn/en_releasepaper/content/4762832

Saved Papers

Saved Papers


	3. Intensity-driven bounding box supervised brain white matter hyperintensities segmentation algorithm
	Cheng Ao
	Computer Science and Technology 08 March 2024
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:White matter hyperintensities (WMHs) serves as a crucial imaging feature for assessing cerebral white matter abnormalities, and accurate segmentation of WMHs holds significant importance for tracking disease progression, evaluating treatment effects, and studying and understanding various neurological and geriatric disorders. Presently, deep learning-based methods for WMHs segmentation rely heavily on extensively annotated training data at the pixel level. However, the irregular shapes, random distribution, and fuzzy boundaries characteristic of WMHs make acquiring pixel-level precise labels prohibitively costly. To mitigate the reliance on pixel-level annotations, this paper introduces an intensity-driven bounding box supervised brain white matter hyperintensities segmentation algorithm (IDBB), which substitutes precise labels with weak bounding box labels during model training. IDBB employs an intensity-based adaptive thresholding method to generate pixel-level pseudo-labels from bounding box labels and trains the segmentation network using both Dice loss and cross-entropy loss. Additionally, this paper introduces a WMHs segmentation dataset containing bounding box labels of various sizes, serving as a benchmark dataset for bounding box supervised WMHs segmentation tasks. Results demonstrate that the proposed method achieves segmentation performance on the Dice similarity coefficient (DSC) comparable to 90\% of fully supervised methods, surpassing other weakly supervised approaches. Experimental validation illustrates the effectiveness of the proposed method in reducing annotation costs while achieving satisfactory segmentation performance.
	TO cite this article:Cheng Ao. Intensity-driven bounding box supervised brain white matter hyperintensities segmentation algorithm[OL].[ 8 March 2024] http://en.paper.edu.cn/en_releasepaper/content/4762582


	4. An attention-enhanced neural network with distillation training for barcode detection
	Wang Zijian,Zhou Xiaoguang
	Computer Science and Technology 03 April 2023
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Barcodes have played an essential role in our daily life. Localizing, or detecting them in real scenes in a fast and robust way has many practical applications. Recently, some deep learning-based methods have shown great potential in object detection. However, because barcodes are placed at any angle, vertical bounding boxes cannot sufficiently capture accurate orientation and scale information. In this paper, we propose a barcode detector that performs dense prediction to accurately locate the position of pixels belonging to the barcode region. For better detection performance, we design a spatial attention module to integrate global information adaptively, which can be easily plugged into the prediction backbone. Meanwhile, we employ the knowledge distillation training strategy to train a small student network with the help of a heavy teacher network. Extensive experiment results demonstrate that our method can perform real-time speed on CPU environments and locate barcodes in images with complex scenes.
	TO cite this article:Wang Zijian,Zhou Xiaoguang. An attention-enhanced neural network with distillation training for barcode detection[OL].[ 3 April 2023] http://en.paper.edu.cn/en_releasepaper/content/4759920


	5. YoloDepth: Yolo with Monocular Depth Estimation for Object Distance Measurement
	Chen Fei-Yang,Jiao Ji-Chao
	Computer Science and Technology 13 February 2023
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Environmental perception system is an important part of autonomous driving. A high-precision, real-time perception system can help the vehicles make feasible decisions and reasonable plans for the next step while driving. We propose a multi-task environmental perception network (YoloDepth) that can simultaneously perform traffic object detection and distance measurement. It consists of an encoder for feature extraction and two decoders for specific tasks. Our model performs excellently on COCO 2017 object detection dataset and KITTI monocular depth estimation dataset, achieving state-of-the-art speed and accuracy, and can process both visual perception tasks simultaneously on the embedded device Jeston AGX Xavier (18.3 FPS) in real-time and maintain great accuracy.
	TO cite this article:Chen Fei-Yang,Jiao Ji-Chao. YoloDepth: Yolo with Monocular Depth Estimation for Object Distance Measurement[OL].[13 February 2023] http://en.paper.edu.cn/en_releasepaper/content/4759099


	6. Pseudo-label-based Decoupling Domain Adaptation for Long-tail Distribution with Domain Discrepancy
	Liu YiChen,Wu ZhenYu
	Computer Science and Technology 13 February 2023
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:In real-world scenarios, machine learning tasks suffer from long-tail distribution or domain discrepancy problems, and many recent works have proposed effective methods to solve the challenges respectively. However, few studies have paid attention to the two problems simultaneously, since long-tail distribution and domain discrepancy both perhaps influence the generalization of machine learning models. Thus, according to the upper bound error theory, a design principle is given to solve the long-tail distribution with domain discrepancy problem (LT-DD) , and a pseudo-label-based decoupling domain adaptation method (PLD-DA) is proposed following the design principle in this paper.PLD-DA follows a two-stage domain adaptation framework, which trains a domain-invariant feature extractor on the original long-tail dataset at the first stage while adjusts the classifier with reweighting method at the second stage. To improve the classification confidence for the classifier, the pseudo-label information of target domain is introduced and a self-learning strategy is used. Experiments are conducted to show that our method could achieve a well-transfered feature extractor and a confident unbiased classifier simultaneously on LT-DD tasks, improves the model's generalization compared to end-end rebalancing domain adaptation methods.
	TO cite this article:Liu YiChen,Wu ZhenYu. Pseudo-label-based Decoupling Domain Adaptation for Long-tail Distribution with Domain Discrepancy[OL].[13 February 2023] http://en.paper.edu.cn/en_releasepaper/content/4759091


	7. No-reference video quality assessment based on human attention system for background replacement applications
	WANG Yinan,WANG Jing,SHEN Qiwei
	Computer Science and Technology 25 February 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:With the wide application of image and video background replacement in many scenes such as short video production and high-definition video conference, more and more background replacement algorithms and video creations are produced. But there are great differences in the quality of image and video after replacement. Evaluating the quality of image and video after background replacement has important guiding significance in industry and academia. In the background replacement scene, the factors affecting the video quality after replacement include the distorsion of video frames, inter frame jitter, composition and chroma harmony. Among them, the accuracy and quality of video frames is a very important evaluation dimension. In this paper, we proposes a deep learning algorithm based on visual attention mechanism to realize the accuracy quality assessment in the application of video background replacement. Firstly, the convolution neural network (CNN) is designed to extract the distortion feature, and then the spatial saliency feature and temporal motion feature are fused through the attention mechanism. Finally, the subjective perception of video accuracy by human vision is fitted to evaluate the perception accuracy of background replacement videos.
	TO cite this article:WANG Yinan,WANG Jing,SHEN Qiwei. No-reference video quality assessment based on human attention system for background replacement applications[OL].[25 February 2022] http://en.paper.edu.cn/en_releasepaper/content/4756403


	8. Adaptive and Attention-joint Supervision for Weakly Supervised Segmentation
	Ma Yue,Wan Hongjiang
	Computer Science and Technology 23 February 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Image-level weakly supervised semantic segmentation is a great challenge to compensate for the missing mask labels. Methods based on image-level labels primarily use class activation map (CAM) to approximate the segmentation mask. In view of the pseudo masks only focus on the class-specific discriminative regions of objects, various methods are explored to expand pseudo masks to cover ground-truths. Contemporary methods tend to use a lower threshold to distinguish objects and backgrounds in order to adjust CAMs for higher object coverage. However, too many backgrounds are misclassified into pseudo masks and excessive noise is trained in downstream tasks. To surmount this crux, we propose our Adaptive and Attention-joint Supervision method (AAJS). AAJS divides classification network into two branches and adds the trend of expansion and convergence to the two branches respectively for their class-specific features, i.e. regions activated by CAM. Each branch is adaptive constrained by another branch based on the confidence of the features. Then, the class-specific features are enriched so as to obtain a more accurate CAM at a lower threshold. Moreover, we propose an adaptive feature dropout method to prevent the classification network from relying too much on discriminative regions. AAJS are based on the experiments evaluated on PASCAL VOC 2012 and matches or exceeds the state-of-the-art performance compare to existing methods.
	TO cite this article:Ma Yue,Wan Hongjiang. Adaptive and Attention-joint Supervision for Weakly Supervised Segmentation[OL].[23 February 2022] http://en.paper.edu.cn/en_releasepaper/content/4756323


	9. P-SANET: A HIGH-PRECISION REALTIME POINT CLOUD SEMANTIC SEGMENTATION FRAMEWORK
	GOU Xiaofeng,JIAO Jichao,ZHANG Chengkai
	Computer Science and Technology 10 January 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Perception in autonomous system is an important task to guide decision execu-tion. Lidar point cloud is a type of dataset to complete perception task, it is rich in original information, easy to collect, and convenient to store. Compared to camera image, point cloud contains precise spatial information and adapts to various en-vironments, nevertheless, more information means more computing power con-sumption. The processing speed and accuracy are two key metrics of neural net-work framework. The traditional methods have to pay the price of reducing accu-racy for increasing processing speed. Though some frameworks preprocess point cloud into projected image, the 2D image tensor also contains a large number of redundant channel features in the traditional 2Dconvolution operation. In this pa-per, we propose a point clouds semantic segmentation framework, we replace the standard convolutional layer with a new sub-module, and it greatly reduces the amount of computation, besides, we introduce a sub-module to fuse the coordi-nate values and middle tensors. The framework in this paper is divided into three parts: spherical projection preprocessing module, En-Decoder module and data post-processing module. We use the SemanticKITTI dataset to conduct experi-ments, and the results show that our framework outperforms other frameworks both in prediction accuracy and prediction speed. We also use sparse point cloud dataset to test the generalization of our framework, and the experiments show that it performs better than other frameworks. Code is available at: https://github.com/windtries/P-SANet
	TO cite this article:GOU Xiaofeng,JIAO Jichao,ZHANG Chengkai. P-SANET: A HIGH-PRECISION REALTIME POINT CLOUD SEMANTIC SEGMENTATION FRAMEWORK[OL].[10 January 2022] http://en.paper.edu.cn/en_releasepaper/content/4755947


	10. LCPN: Lightweight Single-Person Pose Estimation Based on Cascaded Pyramid Network
	Meng Ruoli,Fang Wei
	Computer Science and Technology 26 February 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:he task of human pose estimation has been largely improved most recently. However, there are still a lot of challenges to apply it in practice, such as the limited network bandwidth, the privacy and security risks and so on. In this paper, we propose a lightweight human pose estimation model called LCPN, which takes depthwise separable convolution instead of standard convolution to lighten the network. Besides, we try to combine the heatmap prediction and coordinate regression in the keypoint prediction stage, which will further improve the efficiency of the network. The proposed approach achieves an excellent trade-off between speed and accuracy on the LSP and MPII datasets, and is very suitable to run on edge devices with lower computing power.
	TO cite this article:Meng Ruoli,Fang Wei. LCPN: Lightweight Single-Person Pose Estimation Based on Cascaded Pyramid Network[OL].[26 February 2021] http://en.paper.edu.cn/en_releasepaper/content/4753756

	Check out RSS, or use RSS reader to subscribe this item