Research on Lightweight Based on HiFi-GAN Model

Gao Bin; Bie Hong-Xia

Chinese︱Feedback︱Save this page

• Elaborating Academic Views 　　　　 • Exchanging Innovative Ideas
• Protecting Intellectual Properties 　　• Fast Sharing Science Papers

Sponsored by the Center for Science and Technology Development of the Ministry of Education
Supervised by Ministry of Education of the People's Republic of China

Home > Papers

Research on Lightweight Based on HiFi-GAN Model

Gao Bin, Bie Hong-Xia *

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876

*Correspondence author

#Submitted by

Subject:

Funding: none

Opened online:15 April 2024

Accepted by: none

Citation: Gao Bin, Bie Hong-Xia.Research on Lightweight Based on HiFi-GAN Model[OL]. [15 April 2024] http://en.paper.edu.cn/en_releasepaper/content/4763318

The HiFi-GAN model combines audio with high efficiency and high fidelity in speech synthesis, but its huge model parameters and calculations make it difficult to store, deploy, and perform real-time computing on limited-resource devices. There are relatively few studies on the compression and acceleration of HiFi-GAN models, while the need for real-time speech synthesis on mobile devices and edge devices is very extensive. This paper studies the lightweighting method of HiFi-GAN model to realize the network compression and deploy on the hardware platform. The main research and results of the paper are as follows: Aiming at the problems of high computing and storage resource consumption and complex structure in the inference stage of HiFi-GAN, this paper proposes a compression method combining knowledge distillation and architecture search according to its network structure. In this method, the resBlock layer of the HiFi-GAN model is convolutional decomposed to obtain a relatively compact student model, and the designed training objective scheme is used for distillation learning. After that, this student network is used as a "one-time" network for architecture search, and finally the compact optimal sub-student model is obtained. Experiments show that this method significantly compreses the size of the HiFi-GAN model and reduces the amount of computation, and generates good speech quality PESQ values on both the single speaker LJSpeech dataset and the unknown speaker VCTK dataset.

Keywords:artificial intelligence; HiFi-GAN; Knowledge distillation; Architecture search

For this paper

● PDF (0B)
● Revision 0 　　
● Print this paper
● Recommend this paper to a friend
● Add to my favorite list

Saved Papers

Please enter a name for this paper to be shown in your personalized Saved Papers list

Tags

Add yours

Related Papers

Statistics

PDF Downloaded	0
Bookmarked	0
Recommend	0
Comments	Array

Submit your papers

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.

	Check out RSS, or use RSS reader to subscribe this item

Saved Papers