|
The HiFi-GAN model combines audio with high efficiency and high fidelity in speech synthesis, but its huge model parameters and calculations make it difficult to store, deploy, and perform real-time computing on limited-resource devices. There are relatively few studies on the compression and acceleration of HiFi-GAN models, while the need for real-time speech synthesis on mobile devices and edge devices is very extensive. This paper studies the lightweighting method of HiFi-GAN model to realize the network compression and deploy on the hardware platform. The main research and results of the paper are as follows: Aiming at the problems of high computing and storage resource consumption and complex structure in the inference stage of HiFi-GAN, this paper proposes a compression method combining knowledge distillation and architecture search according to its network structure. In this method, the resBlock layer of the HiFi-GAN model is convolutional decomposed to obtain a relatively compact student model, and the designed training objective scheme is used for distillation learning. After that, this student network is used as a "one-time" network for architecture search, and finally the compact optimal sub-student model is obtained. Experiments show that this method significantly compreses the size of the HiFi-GAN model and reduces the amount of computation, and generates good speech quality PESQ values on both the single speaker LJSpeech dataset and the unknown speaker VCTK dataset. |
|
Keywords:artificial intelligence; HiFi-GAN; Knowledge distillation; Architecture search |
|