Abstract:When faced with a mixed audio, people are often interested in the voice of one person instead of all the voices. Speaker extraction is precisely for this situation, extracting the voice of the target speaker in a multi-speaker environment, in order to imitate human selective auditory attention. In previous models, extraction is generally performed in the frequency domain, and the time domain signal is reconstructed according to the extracted amplitude and estimated phase spectrum. But this method will be affected by the phase estimation when reconstructing the time domain signal. With reference to Conv-TasNet and SpEx, this paper uses a speaker extraction network (F3S) in the time domain. The network converts the mixed speech into embedding coefficients instead of decomposing the speech signal into amplitude and phase spectra, thereby avoiding phase estimation problems. On this basis, this paper uses the μ-law compression and expansion algorithm to process the data, aiming to improve the training speed and extraction speed. The experimental results show that the F3S network in this paper can effectively improve the training speed and extraction speed while maintaining a good scale-invariant SDR (SI-SDR).

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.


	1. Research on Single-channel Target Speech Extraction Using μ-Law Algorithm
	Linhui Qiu,Jing Wang
	Computer Science and Technology 17 January 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:When faced with a mixed audio, people are often interested in the voice of one person instead of all the voices. Speaker extraction is precisely for this situation, extracting the voice of the target speaker in a multi-speaker environment, in order to imitate human selective auditory attention. In previous models, extraction is generally performed in the frequency domain, and the time domain signal is reconstructed according to the extracted amplitude and estimated phase spectrum. But this method will be affected by the phase estimation when reconstructing the time domain signal. With reference to Conv-TasNet and SpEx, this paper uses a speaker extraction network (F3S) in the time domain. The network converts the mixed speech into embedding coefficients instead of decomposing the speech signal into amplitude and phase spectra, thereby avoiding phase estimation problems. On this basis, this paper uses the μ-law compression and expansion algorithm to process the data, aiming to improve the training speed and extraction speed. The experimental results show that the F3S network in this paper can effectively improve the training speed and extraction speed while maintaining a good scale-invariant SDR (SI-SDR).
	TO cite this article:Linhui Qiu,Jing Wang. Research on Single-channel Target Speech Extraction Using μ-Law Algorithm[OL].[17 January 2022] http://en.paper.edu.cn/en_releasepaper/content/4756081


	2. Adaptive speech enhancement based on SNR perception in non-stationary noise scenarios
	Chen Zhishuai,Wang Jing
	Computer Science and Technology 14 January 2022
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Speech enhancement is always a hot topic in the field of speech processing. In real life, interference signals are usually non-stationary or even burst. Therefore, it is of great significance to study speech enhancement technology in non-stationary noise scene to solve practical problems. The speech enhancement algorithm for the following questions, first, is currently on the market more studies on speech enhancement based on the assumption of stationary noise, but in real life, the environment is rapidly changing, noise cannot be ideally unchanged, second, because of the non-stationary noise is hard to predict, in the case of low SNR, it is difficult to maintain the signal distortion after speech enhancement. Aiming at the above problems, this paper proposes an adaptive speech enhancement method based on SNR perception in non-stationary noise scenes to improve the adaptability of speech enhancement in complex non-stationary scenes. At the same time, the neural network is designed to calculate the signal-to-noise ratio and improve the processing capability of the model.
	TO cite this article:Chen Zhishuai,Wang Jing. Adaptive speech enhancement based on SNR perception in non-stationary noise scenarios[OL].[14 January 2022] http://en.paper.edu.cn/en_releasepaper/content/4756093

Saved Papers

Saved Papers


	3. Ship Detection via Multi-scale Graph Convolutional Network
	Muyan Feng,Ming Wu,Xin Jiang,Chuang Zhang
	Computer Science and Technology 26 March 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:In the existing high resolution optical remote sensing images, objects are often located in a complex environment and most datasets have serious problems of imbalance in the number and size of samples, especially the ship samples which located in a changeable marine environment. Our research is mainly to solve the problem of object detection of different scales in unbalanced datasets in complex background. Ship datasets are typical datasets with these characteristics, so we will choose ship targets as experimental dataset. We introduce a Multi-scale Graph Convolutional Network(MGCN), which is formed by a multi-scale module and GCN module to get a better performance on ship detection. For the multi-scale module, we add dilated convolutions combinations with suitable dilated rates on feature maps of different sizes to obtain context and global information, which improves the detection accuracy of multi-scale objects. For the GCN module, we try to design a co-occurrence matrix as the input of GCN to summarize the relationship from the dataset as the prior knowledge. By updating features from related objects, it can enhance local representations to get more accurate result. MGCN outperforms than existing methods on ship detection and provides a new baseline for the dataset. Experiments verify the effectiveness of our method, e.g. achieving around 16.7\% on ship detection dataset FGSD in terms of mAP. We also visualize ship detection results and show the improvement of our method. Our network is generalized and can be applied to different types of datasets.
	TO cite this article:Muyan Feng,Ming Wu,Xin Jiang, et al. Ship Detection via Multi-scale Graph Convolutional Network[OL].[26 March 2021] http://en.paper.edu.cn/en_releasepaper/content/4754159


	4. MLTN: Meta-Learning Tower Network for Cold-Start Recommendation
	LOU Si-Yuan, WANG Yu-Long
	Computer Science and Technology 20 January 2021
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Cold-start recommendation issues refer to the recommendation task about new users and items, lots of work has been made to solve this problem. Model agnostic meta-learning (MAML) is a popular paradigm recently, which is used to train models that are able to learn and can be generalized. The key idea underlying MAML is to train the model’s initial parameters such that the model has maximal performance on a new task after the parameters have been updated through one or more gradient steps computed with a small amount of data from that new task. Inspired by the thoughts, we regard cold-start recommendation issues as few-shot meta-learning problem and propose meta-learning tower network (MLTN). Then we formalize the task for each user and train the model’s parameters in meta-learning optimization way. Extensive experiments on both industrial datasets and public datasets demonstrate the superiority of MLTN.
	TO cite this article:LOU Si-Yuan, WANG Yu-Long. MLTN: Meta-Learning Tower Network for Cold-Start Recommendation[OL].[20 January 2021] http://en.paper.edu.cn/en_releasepaper/content/4753511


	5. Research on Osteoporosis Risk Assessment Based on Semi-supervised Machine Learning
	LEI Lu, LUO Tao
	Computer Science and Technology 12 May 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:This paper proposes a semi-supervised machine learning method for osteoporosis risk assessment. Existing osteoporosis risk assessment models have problems of low accuracy, and cannot utilize large amounts of unlabeled data. In order to improve the accuracy of diagnosis, the method comprehensively considers the osteoporosis-related questionnaire data and bone image data, and fuses the multi-modal features extracted from them. Feature engineering and Word2vec are used to extract numerical and text features from questionnaires, respectively. CNN is used to extract image features from BMD images. Considering the difficulty of obtaining labeled medical data, this paper builds a self-training semi-supervised model based on XGBoost to classify and evaluate osteoporosis, which uses both labeled and unlabeled data for obtaining better generalization capabilities. Besides, in view of the fact that the questionnaire data has plenty of outliers and missing data, this paper removes outliers based on a DBSCAN algorithm and propose an improved PKNN algorithm to impute the missing data. Experimental results show that the proposed improved semi-supervised method achieves an accuracy of 0.78 in osteoporosis risk assessment and has obvious advantages compared with other methods.
	TO cite this article:LEI Lu, LUO Tao. Research on Osteoporosis Risk Assessment Based on Semi-supervised Machine Learning[OL].[12 May 2020] http://en.paper.edu.cn/en_releasepaper/content/4752070


	6. 2D to 3D Depth Map Prediction Based on Image Segmentation
	QIAN Zhixuan,WANG Chensheng,YANG Guang,LI Yangguang,JING Xueliang,LI Yanjiang
	Computer Science and Technology 26 April 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:This paper proposes an algorithm to convert 2D video of road video to 3D video.In this kind of video, the foreground is the most concerned part, and accurately extracting the foreground object from the background is the key to get the depth map. In this paper, a graph cutting algorithm based on machine learning is used to obtain the foreground, and the background depth model is constructed according to the scene structure to obtain the background depth map. Based on the background depth map, the depth of the foreground object is assigned according to the distance relationship between the foreground and the lens. Then, the background depth map and foreground depth map are combined to obtain a complete depth map.
	TO cite this article:QIAN Zhixuan,WANG Chensheng,YANG Guang, et al. 2D to 3D Depth Map Prediction Based on Image Segmentation[OL].[26 April 2020] http://en.paper.edu.cn/en_releasepaper/content/4751786


	7. A Category-Based Calibration Approach with Fault Tolerance for Air Monitoring Sensors
	WANG Rao,LI Qing-Yong,YU Hao-Min,YU Hao-Min,CHEN Ze-Chuan,ZHANG Ying-Jun,ZHANG Ling,CUI Hou-Xin,ZHANG Ke
	Computer Science and Technology 20 March 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Air pollution monitoring has attracted much attention in recent years because of the deterioration of the environment. Standard stations, installed by governments with a high cost, can provide reliable air quality information; whereas a large number of portable air monitoring sensors with low cost are widely used and output less precise results.In this paper, we propose a category-based calibration approach (CCA) using machine learning algorithms for such portable sensors. Compared with traditional methods that often learn a single regression model, CCA includes multiple regression models according to pollutant concentration categories, and builds a more accurate mapping from sensor readings to reference. Furthermore, CCA introduces two fault-tolerance modules: classification tolerance and sample tolerance. The former mitigates the impact of misclassification for concentration category, and the latter improves the robustness of individual regression model. Our approach is evaluated on carbon monoxide (CO) and ozone (O$_{3}$) from two cities of China. The experiment results show that CCA has a better performance than traditional calibration models in both accuracy and robustness.
	TO cite this article:WANG Rao,LI Qing-Yong,YU Hao-Min, et al. A Category-Based Calibration Approach with Fault Tolerance for Air Monitoring Sensors[J].


	8. Research on Recommendation Based on DeepFM and Graph Embedding
	Yang Zhixiang,Liu Xiaohong
	Computer Science and Technology 10 March 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:The traditional recommendation system usually focuses on the coupling of feature information between users and items, but fails to effectively investigate the complex networks of users and items. At the same time, the graph algorithms are often used to analyze the point-edge relationships in networks, and we can combine as many network node features as possible through graph machine learning. To this end, in this paper, by combining the graph algorithm with the recommendation algorithm, prediction is conducted by embedding information. First, we employ the DeepFM and the GNNs to perform information mining of explicit and implicit features of the feature information and the heterogeneous structure network. Then, we combine the features of the two embedding layers to construct the final embedding vector. Finally, we use a multi-layer fully connected and activation function to predict the results. Two standard data sets are used in our experiment. The results show that the new model has the best performance in the recommended field.
	TO cite this article:Yang Zhixiang,Liu Xiaohong. Research on Recommendation Based on DeepFM and Graph Embedding[OL].[10 March 2020] http://en.paper.edu.cn/en_releasepaper/content/4751142


	9. Formal Verification of Calculus without Limits in Coq
	Guo Liquan,Yu Wensheng
	Computer Science and Technology 12 February 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Artificial intelligence is one of China's current major science and technology development strategies. Mathematical formalization, as an important theoretical basis for artificial intelligence, is of great significance to the development of science and technology. Based on the proof assistant Coq, this paper realizes the formal verification of calculus without limit theory, includes Coq descriptions of Uniformly Continuity, Uniformly Derivable, Strongly Derivable and Integral System. Then, this paper prove some properties of Uniformly Derivable and Valuation Theorem with Coq, all formalization processes have been verified by Coq. The formalization demonstrates that the Coq-based mechanized proof of mathematics theorem has the characteristics of readability and interactivity.
	TO cite this article:Guo Liquan,Yu Wensheng. Formal Verification of Calculus without Limits in Coq[OL].[12 February 2020] http://en.paper.edu.cn/en_releasepaper/content/4750736


	10. Multi-emotional single-track music generating model based on LSTM
	WANG Xicheng,LI Wei
	Computer Science and Technology 11 February 2020
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:With the popularity of short video platforms, it has become very common for users to create videos for sharing. As an integral part of short videos, background music plays an important role in emotional expression. However, the background music currently in short video platforms is relatively single, and it also involves copyright issues. In this paper, by improving existing music generation model, a multi-emotional single-track music generation model is proposed. By analyzing the advantages and disadvantages of the original network and the lookback mechanism, and combining with the actual application scenario, the LB-Attention model is proposed. Note positioning information, music emotional information, and attention mechanism are introduced into the model to achieve the requirements of application scenarios. By comparing the generated results and performance indicators of the original model and the model in this paper, it is concluded that the model has excellent music generation effect. The performance of LB-Attention model is similar to the original model, and can basically meet the needs of the application scenario.
	TO cite this article:WANG Xicheng,LI Wei. Multi-emotional single-track music generating model based on LSTM[OL].[11 February 2020] http://en.paper.edu.cn/en_releasepaper/content/4750718

	Check out RSS, or use RSS reader to subscribe this item