Abstract:For object detection, many methods based on deep con volutional neural networks have greatly improved the speed of detection while ensuring accuracy. However, numer ous studies still have the problem of inaccurate focus on multi-scale objects. They also cannot capture the insuf ficient feature because of lacking global context informa tion. In this paper, we start from these issues and propose an effective architecture called DoubleS-AM including Spa tial Pyramid Pooling Attention Module(SPP-AM) and Self Weight Attention Module(SW-AM), which aims to capture important information among the feature maps at two dif ferent levels with attention mechanism, including channel level and spatial-level modules. Specifically, the channel level module(SPP-AM) pays more attention to multi-scale objects adaptively via weighting the channels with different receptive field feature information, while the spatial-level module(SW-AM) captures the global context similarity dis tribution of deeper feature to enhance semantic information of the shallower feature via feature pyramid. Combining two level modules, we design the end-to-end training net works to emphasize useful information while generating re liable and rapid predictions. We conduct extensive exper iments in comparison to state-of-the-art baseline and have significantly improved the results of object detection. The mAP(Iou=0.5) of different networks we design on PASCAL VOC2007 have increased by 4%. On MS COCO2017, the mAP has increased by about 3% and APS of our networks have increased by 3%-5%, which means it has a significant effect on multi-scale detection, especially on small object detection.

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.


	1. Double attention module for rapid multi-scale object detection
	LIANG Jiaqi,MA Yue
	Computer Science and Technology 27 January 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:For object detection, many methods based on deep con volutional neural networks have greatly improved the speed of detection while ensuring accuracy. However, numer ous studies still have the problem of inaccurate focus on multi-scale objects. They also cannot capture the insuf ficient feature because of lacking global context informa tion. In this paper, we start from these issues and propose an effective architecture called DoubleS-AM including Spa tial Pyramid Pooling Attention Module(SPP-AM) and Self Weight Attention Module(SW-AM), which aims to capture important information among the feature maps at two dif ferent levels with attention mechanism, including channel level and spatial-level modules. Specifically, the channel level module(SPP-AM) pays more attention to multi-scale objects adaptively via weighting the channels with different receptive field feature information, while the spatial-level module(SW-AM) captures the global context similarity dis tribution of deeper feature to enhance semantic information of the shallower feature via feature pyramid. Combining two level modules, we design the end-to-end training net works to emphasize useful information while generating re liable and rapid predictions. We conduct extensive exper iments in comparison to state-of-the-art baseline and have significantly improved the results of object detection. The mAP(Iou=0.5) of different networks we design on PASCAL VOC2007 have increased by 4%. On MS COCO2017, the mAP has increased by about 3% and APS of our networks have increased by 3%-5%, which means it has a significant effect on multi-scale detection, especially on small object detection.
	TO cite this article:LIANG Jiaqi,MA Yue. Double attention module for rapid multi-scale object detection[OL].[27 January 2022] http://en.paper.edu.cn/en_releasepaper/content/4756196


	2. SOFT-AlignUNet: A Lightweight Transformer with Feature Alignment
	WU Rui-Jia,ZHANG Hong-Gang
	Computer Science and Technology 04 January 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Transformer, the prevalent backbone architecture in natural language processing, has been adopted in various vision tasks since the proposition of vision transformer. The performance of transformer has been proved to be almost the same as CNN's and even be better with large enough dataset. However, the initial vision transformer suffered from the straightforward structure, which requires large parameters and expensive computation cost, especially in the dense prediction task. This paper concentrates on medical image semantic segmentation task. In medical scene, UNet is always the popular backbone and many researchers proposed transformer-CNN or pure transformer UNet model recently. But the inherent feature misalignment caused by resizing feature maps and concatenation is still lack of focus. In this paper, a lightweight transformer-CNN hybrid UNet, SOFT-AlignUNet (SOFT-AU) , is proposed to solve above issues. On one hand, a novel softmax-free transformer, which reduces the calculation cost to be linear to the patch number, is introduced into UNet architecture to alleviate the computation cost at a large extent. On the other hand, the feature misalignment is taken into consideration and a river-like Feature Alignment Flow is proposed to generate spatial deviation and correct the features. The architecture achieves strongly competitive results on public Synapse and DRIVE dataset with pretty light model size and computation requirement. The results show that this is a pretty promising network for future deployment in reality.
	TO cite this article:WU Rui-Jia,ZHANG Hong-Gang. SOFT-AlignUNet: A Lightweight Transformer with Feature Alignment[OL].[ 4 January 2022] http://en.paper.edu.cn/en_releasepaper/content/4755984

Saved Papers

Saved Papers


	3. When Tiramisu meet WGAN: A Model for Portrait Matting
	ZHANG Jia-xuan,Peng Hai-peng
	Computer Science and Technology 24 December 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Image matting technology is an important topic in computer vision and has been used in a wide variety of applications. In particular, portrait matting is widely used in film, advertising, short video and other areas. Matting is not a simple image segmentation problem, a good matting algorithm can deal with hair and other details accurately. In this paper, we combine the fully convolutional dense network named Tiramisu with the generative adversarial network(GAN). Through Tiramisu, we can obtain a coarse alpha channel and the RGB channels with chromatic aberration. Next, we use the structure of GAN to refine the details, the network needs to adjust the boundary of alpha channel and the color values of the other three channels. However, the instability is GAN's shortcoming. In order to solve this weakness, Wassertein distance is introduced into the loss function of GAN. It completely solves the problem of unstable training of GAN, and basically solves mode collapsing problem. We compare the proposed approach with several state-of-the-art image matting methods on synthetic datasets. The main difference between our method and other methods is that we generate an RGBA image rather than only an alpha channel. It makes full use of the advantages of GAN in the field of image generation. The result shows our model outperforms most of its competitors in the image quality, and it is superior than all of competitors in real-time performance.
	TO cite this article:ZHANG Jia-xuan,Peng Hai-peng. When Tiramisu meet WGAN: A Model for Portrait Matting[OL].[24 December 2021] http://en.paper.edu.cn/en_releasepaper/content/4755954


	4. Adaptive Margin of Triplet-Center Loss for Deep Metric Learning
	YAO Li,ZHANG Bin
	Computer Science and Technology 06 January 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:In the family loss functions built on pair-based, most of them need to manually tune uniform thresholds between pairs to optimize the parameters of network. However, those hyper-parameters are fixed which is unreasonable for the reason that any two classes have different similarity. What’s more, it has to cost too much time and energy to tune the hyper-parameters for each task to find suitable values. Therefore, this paper proposes a novel loss named adaptive margin of triplet-center loss (AMTCL), which can learn a specific margin for a center of each class, while keep inter-class separateness, enhance the discriminative power of features and lighten our burden. Finally, the proposed AMTCL obtains state-of-the-art performance on four image retrieval benchmarks. Without whistle and blow, the proposed loss only need a few codes can be easily implemented in current network.
	TO cite this article:YAO Li,ZHANG Bin. Adaptive Margin of Triplet-Center Loss for Deep Metric Learning[OL].[ 6 January 2021] http://en.paper.edu.cn/en_releasepaper/content/4753303


	5. Medical Image Segmentation based on Octave Convolution
	Zhang Qiong,Tan Guanghua
	Computer Science and Technology 10 June 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Medical image segmentation is a very vital research field in computer vision. How to realize an instant and accurate segmentation is of great importance in medical image segmentation. Image segmentation based on deep learning technique can be described as an encoder-decoder architecture. The most classic existed encoder-decoder model is U-Net. However, it can not solve the blurred boundary problem in predicting the segmentation result of the high resolution image. Therefore, this paper proposes a deep learning method that is based on boundary information. This paper proposes adopting Octave convolution to decompose the features into low-frequency feature and high-frequency feature and utilizing the low spatial frequency component to get the segmentation of the smoothly changing structure in the original image and the high spatial frequency component to get the segmentation of the rapidly changing fine details in the original image, followed by using the segmentation of fine details as the constrain condition. This paper proposes concatenating the smoothly changing structure segmentation and the rapidly changing fine details segmentation to realize the constrain condition. The segmentation result of the whole original image is obtained by putting the concatenated segmentation into the convolutional layer for class prediction. Meanwhile, this paper considers the class imbalance problem in the multi-class segmentation and proposes giving more weight to the rare classes. Because this paper adopts Octave convolution and the encoder-decoder method as U-Net, this paper calls the proposed approach Oct-UNet. This proposed method can not only achieve better results than U-Net, but also contains less parameters. The following conducted experiments verify the effect of the proposed approach.
	TO cite this article:Zhang Qiong,Tan Guanghua. Medical Image Segmentation based on Octave Convolution[OL].[10 June 2020] http://en.paper.edu.cn/en_releasepaper/content/4752349


	6. DCDN: Double Cross & Deep Network for News Recommandation
	Zhihong Yang,Yulong Wang
	Computer Science and Technology 18 March 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:The recommendation system is widely used in Internet products, and the recommendation algorithm is paid more and more attention by researchers. This paper proposes Double Cross & Deep Network (DCDN) algorithm for news recommendation. On the basis of DCN network, this algorithm proposes a new double-crossing depth network, which extracts the features of "related news" in the recommended candidate set separately, and displays the feature crossing with the user information and the seed news information respectively. The two Cross networks and Deep networks of DCDN are independent from each other. Cross Netword is used to obtain the Cross information between features, and Deep Network is used to model high-order nonlinear features. Users can change Network parameters according to the prediction requirements. Among them, SR-Cross is used to ensure the correlation between seed news and recommended news, and UR-Cross combines with user portrait to improve users\' reading interest. The experiment on two real data sets proves that the DCDN algorithm has better accuracy performance compared with other deep learning models and is practical in engineering while guaranteeing the speed.
	TO cite this article:Zhihong Yang,Yulong Wang. DCDN: Double Cross & Deep Network for News Recommandation[OL].[18 March 2020] http://en.paper.edu.cn/en_releasepaper/content/4751250


	7. Multimodal Information Fusion Based Housing Prices Prediction
	CHANG Cheng,ZHANG Zhongbao
	Computer Science and Technology 27 May 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Housing price prediction has caught much attention and has been researched for a long time, and it is known to all that the value of a house is influenced by a wealth of determinants, some of which are irregular or even cannot be quantified. Moreover, housing prices fluctuate tremendously in reality. In this scenario, it remains a challenging task: To design an accurate, multi-dimensional predictive method of estimating housing prices. Previous work on this problem focuses on the value of housing independently and makes use of the structured features (such as floors, the number of rooms, etc.) to offer a valuation of the house. However, this assumption does not hold in reality since housing prices are strongly related to time characteristics, and there exist some ignored unstructured features (visual information) that will ultimately affect prices in housing transactions. Therefore, to address these limitations, we rethink the housing price prediction problem and leverage a multimodal fusion framework with the other two important factors taken into consideration: the time series of house prices and the unstructured information part of the house. We design an efficient differential housing price prediction model based on Multimodal Deep Learning. In this framework, we first propose an advanced time series correlation techniques to improve the predictive performance of average house price across a certain time scale. Next, we design an efficient image algorithm to mine more favorable features from the unstructured information. Then our prediction result is the sum of the mean and difference in house prices. We refine more about the deep features of house prices and the features of the house itself in order to provide better, richer housing price prediction results. Through extensive experiments on real-world datasets, we demonstrate that our algorithm performs better than the baseline and state-of-art approaches.
	TO cite this article:CHANG Cheng,ZHANG Zhongbao. Multimodal Information Fusion Based Housing Prices Prediction[OL].[27 May 2019] http://en.paper.edu.cn/en_releasepaper/content/4748737


	8. PA-RetinaNet: Path Augmented RetinaNet for Dense Object Detection
	Tan Guanghua,Guo Zijun,Xiao Yi
	Computer Science and Technology 29 March 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Object detection methods can be divided into two categories that are the two-stage methods with higher accuracy but lower speed and the one-stage methods with lower accuracy but higher speed. In order to inherit the advantages of both approaches, a novel dense object detector, called Path Augmented RetinaNet (PA-RetinaNet), is proposed in this paper. It not only achieves a better accuracy than the two-stage methods, but also maintains the efficiency of the one-stage methods. Specifically, we introduce a bottom-up path augmentation module to enhance the feature exaction hierarchy, which shortens the information path between lower feature layers and topmost layers. Furthermore, we address the class imbalance problem by introducing a Class-Imbalance loss, where the loss of each training sample is weighted by a function of its predicted probability, so that the trained model focuses more on hard examples. To evaluate the effectiveness of our PA-RetinaNet, we conducted a number of experiments on the MS COCO dataset. The results show that our method is 4.3 \% higher than the existing two-stage method, while the speed is similar to the state-of-the-art one-stage methods.
	TO cite this article:Tan Guanghua,Guo Zijun,Xiao Yi. PA-RetinaNet: Path Augmented RetinaNet for Dense Object Detection[OL].[29 March 2019] http://en.paper.edu.cn/en_releasepaper/content/4748127


	9. Enhanced Style Transfer in Real-time with Histogram-matched Instance Normalization
	ZhongRuiZhu,ManManPeng
	Computer Science and Technology 28 March 2019
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Since the neural networks are utilized to extract information from an image, Gatys et al. found that they could separate the content and style of images and reconstruct them to another image which called Style Transfer. Moreover, there are many feed-forward neural networks have been suggested to speeding up the original method to make Style Transfer become practical application. However, this takes a price: these feed-forward networks are unchangeable because of their fixed parameters which mean we cannot transfer arbitrary styles but only single one in real-time. Some coordinated approaches have been offered to relieve this dilemma. Such as a style-swap layer and an adaptive normalization layer (AdaIN) and so on. It\'s worth mentioning that we observed that the AdaIN layer only aligns the means and variance of the content feature maps with those of the style feature maps. Our method is aimed at presenting an operational approach that enables arbitrary style transfer in real-time, reserving more statistical information by histogram matching, providing more reliable texture clarity and more humane user control. We achieve performance more cheerful than existing approaches without adding calculation ,complexity. And the speed comparable to the fastest Style Transfer method. Our method provides more flexible user control and trustworthy quality and stability
	TO cite this article:ZhongRuiZhu,ManManPeng. Enhanced Style Transfer in Real-time with Histogram-matched Instance Normalization[OL].[28 March 2019] http://en.paper.edu.cn/en_releasepaper/content/4748129


	10. Dense Deep Crossing Network for Recommender System
	Lin Wanying,Wang Yulong
	Computer Science and Technology 15 December 2018
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Manual feature engineering has been the key to the success of many predictive tasks of web applications. However, with the exponential increase in the variety and the volume of features, manual feature engineering comes with high cost. Factorization Machines are able to automatically learn the second-order feature interactions. However, FM models capture the non-linear structure of real-world data in an insufficient way. And recent work has shown that DNNs are able to learn higher-order interactions based on existing ones. In this paper, we propose a Deep Dense Crossing Network (DDCN) for recommender system. DDCN keeps the benefits of a DNN model and propose a novel dense crossing structure which connects each layer to every other layer in a feed-forward fashion. DDCN has several advantages: strengthen feature propagation, encourage feature reuse and implicitly learn feature crossing in an efficiently way. We evaluate the model on two datasets of hotel recommendation and clothes recommendation and our experimental results have demonstrated its superiority over the state-of-art algorithms on the recommendation dataset, in terms of model accuracy.
	TO cite this article:Lin Wanying,Wang Yulong. Dense Deep Crossing Network for Recommender System[OL].[15 December 2018] http://en.paper.edu.cn/en_releasepaper/content/4746727

	Check out RSS, or use RSS reader to subscribe this item