|
Recent studies have shown that convolution network (CNN) has a good performance on modeling the long-term dependence of speech sequences in time domain. Multi-layer stacked dilated convolution is used to effectively enlarge the receptive field of network. However, the distance between the feature points mapped to the previous layer will become larger with the increase of dilated rate in higher layer, which easily leads to the neglect of the short-range information between the feature points. This paper proposes a plug-and-play inverted residual and linear bottleneck module called combined convolution (CB-Conv) module, aiming to extract the short-range information between feature points. The main part of CB-Conv module designs two parallel convolution blocks, one is common dilated convolution block, the other is aggregation convolution block. The latter mainly aggregates the lost details between adjacent points through pooling layer, and integrates with the output of the common dilated convolution to complete the information extraction. Experimental results on TIMIT datasets show that the proposed module achieves 1.04dB SI-SNR gain based on TasNet framework compared with the baseline Conv-TasNet under the condition of same number of stacted main module. |
|
Keywords:speech enhancement; combined convolution; monaural |
|