Home > Papers

 
 
Leverage Network Structure for Incremental Document Clustering
QIAN Tieyun 1 * #,SI Jianfeng 2,LI Qing 2
1.State Key Laboratory of Software Engineering, Wuhan University, WuHan 430072
2.Dept. of Comp. Sci., City Univ. of Hong Kong
*Correspondence author
#Submitted by
Subject:
Funding: Specialized Research Fund for the Doctoral Program of Higher Education, China (No.20090141120050)
Opened online:29 December 2011
Accepted by: none
Citation: QIAN Tieyun,SI Jianfeng,LI Qing.Leverage Network Structure for Incremental Document Clustering[OL]. [29 December 2011] http://en.paper.edu.cn/en_releasepaper/content/4457165
 
 
Recent studies have shown that link-based clustering methods can significantly improve the performance of content-based clustering. However, most previous algorithms are developed for fixed data sets, and are not applicable to the dynamic environments such as data warehouse and online digital library. In this paper, we introduce a novel approach which leverages the network structure for incremental clustering. Under this framework, both the link and content information are incorporated to determine the host cluster of a new document. The combination of two types of information ensures a promising performance of the clustering results. Furthermore, the status of core members is used to quickly determine whether to split or merge a new cluster. This filtering process eliminates the unnecessary and time-consuming checks of textual similarity on the whole corpus, and thus greatly speeds up the entire procedure. We evaluate our proposed approach on several real-world publication data sets and conduct an extensive comparison with both the classic content based and the recent link based algorithms. The experimental results demonstrate the effectiveness and efficiency of our method.
Keywords:Computer software; Document clustering; Incremental algorithm; Network structure
 
 
 

For this paper

  • PDF (0B)
  • ● Revision 0   
  • ● Print this paper
  • ● Recommend this paper to a friend
  • ● Add to my favorite list

    Saved Papers

    Please enter a name for this paper to be shown in your personalized Saved Papers list

Tags

Add yours

Related Papers

Statistics

PDF Downloaded 225
Bookmarked 0
Recommend 5
Comments Array
Submit your papers