Highly Restricted Keyword Selection Based on Sparse Analysis for Uyghur Text Categorization

Dong Wang; Askar Humdulla; Rayilam Parhat; Javier Tejedor

Chinese︱Feedback︱Save this page

• Elaborating Academic Views 　　　　 • Exchanging Innovative Ideas
• Protecting Intellectual Properties 　　• Fast Sharing Science Papers

Sponsored by the Center for Science and Technology Development of the Ministry of Education
Supervised by Ministry of Education of the People's Republic of China

Home > Papers

Highly Restricted Keyword Selection Based on Sparse Analysis for Uyghur Text Categorization

Dong Wang 1,Askar Humdulla 2,Rayilam Parhat 3,Javier Tejedor 4

1. CSLT, RIIT, Tsinghua University, Beijing, 100084

2. Xinjiang University, Wulumuqi, 830049

3.Xinjiang University, Wulumuqi, 830049

4.University of Alcala, Madria Spain

*Correspondence author

#Submitted by

Subject:

Funding: 教育部博士点基金新教类（No.20130002120011）, 新世纪优秀人才计划（No.NCET-10-0969）

Opened online: 9 December 2016

Accepted by: none

Citation: Dong Wang,Askar Humdulla,Rayilam Parhat.Highly Restricted Keyword Selection Based on Sparse Analysis for Uyghur Text Categorization[OL]. [ 9 December 2016] http://en.paper.edu.cn/en_releasepaper/content/4712789

Text categorization (TC) has achieved significant success in recently years; however, in the case where the text is not well represented, TC performance is usually substantially reduced. A particular example of such a scenario is in the content-aware public telephone network (PTN), where the input speech can be only partially transcribed due to the concern of privacy protection and computational cost. One, therefore, needs an effective approach to selecting a highly restricted group of keywords (less than $100$), by which the spoken content can be well represented and so the TC performance is largely retained.Conventional keyword selection approaches are based on a carefully designed intermediate score, and the keywords are selected according to the score independently. This often leads to suboptimum performance. This paper proposes a novel sparsity-based approach to tackling the highly restricted keyword selection for TC. The idea is to formulate keyword selection as an $l_1$ regularized linear optimization problem. The $l_1$ term drives less important dimensions of the model coefficients to zeros, and so the corresponding words are nullified, leaving only the promising keywords. By this approach, the objective function of keyword selection is more consistent to the one used in TC; more importantly, the keywords are selected jointly as a group, leading to a group-optimized selection. The experiments conducted on an Uyghur TC task demonstrated that the proposed approach is highly effective.

Keywords:natural language processing, text categorization, sparse analysis, Uyghur

For this paper

● PDF (0B)
● Revision 0 　　
● Print this paper
● Recommend this paper to a friend
● Add to my favorite list

Saved Papers

Please enter a name for this paper to be shown in your personalized Saved Papers list

Tags

Add yours

Related Papers

Other similar papers
● Facet Annotation by Exte...
● Towards A Noise-Tolerant...

Statistics

PDF Downloaded	44
Bookmarked	0
Recommend	0
Comments	Array

Submit your papers

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.

	Check out RSS, or use RSS reader to subscribe this item

Saved Papers