Abstract:Parallel bilingual corpus provides rich matching information of two corresponding languages. Usually, acquiring high-quality and large-scale parallel bilingual corpus keeps more difficulties. In this paper, we proposed a method to construct Chinese-Japanese news comparable corpus using event extraction technologies. Firstly, we extract Chinese and Japanese news using web crawler, then to extract news feature sets according to event extraction technology which combined with the Japanese-Chinese dictionary, named-entity dictionary, and Hanzi-Kanji mapping table of Japanese-Chinese characters, by calculating the similarity of the extracted news events, we realize a method of similarity detection using the feature of Japanese-Chinese news events and generate the extraction results of bilingual document alignment. Finally, we use the extraction results to train classifier model, which is used for identification of document alignment of Japanese-Chinese news. Experimental results show that our method is effective and it can overcome the shortcoming of traditional methods.

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.


	1. Chinese-Japanese News Comparable Corpus Construction Using Event Extraction
	Yang Jian,Xu Jinan
	Computer Science and Technology 12 May 2016
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Parallel bilingual corpus provides rich matching information of two corresponding languages. Usually, acquiring high-quality and large-scale parallel bilingual corpus keeps more difficulties. In this paper, we proposed a method to construct Chinese-Japanese news comparable corpus using event extraction technologies. Firstly, we extract Chinese and Japanese news using web crawler, then to extract news feature sets according to event extraction technology which combined with the Japanese-Chinese dictionary, named-entity dictionary, and Hanzi-Kanji mapping table of Japanese-Chinese characters, by calculating the similarity of the extracted news events, we realize a method of similarity detection using the feature of Japanese-Chinese news events and generate the extraction results of bilingual document alignment. Finally, we use the extraction results to train classifier model, which is used for identification of document alignment of Japanese-Chinese news. Experimental results show that our method is effective and it can overcome the shortcoming of traditional methods.
	TO cite this article:Yang Jian,Xu Jinan. Chinese-Japanese News Comparable Corpus Construction Using Event Extraction[OL].[12 May 2016] http://en.paper.edu.cn/en_releasepaper/content/4686479


	2. PredicTV: A Behavior-Oriented Real Time Recommender for TV Programs
	Wenjing Fang, Zhiyuan Cai, Xiaodong Wang, Kenny Q. Zhu
	Computer Science and Technology 20 December 2015
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:With112channelsandabout30,000programstowatcheveryweek,televisionviewersin China are overwhelmed with choices. Finding out what to watch can be a time-consuming and frustrating process. In this paper, we present a system that leverages individuals’ viewing behaviors and the useful information about the TV programs to make real-time, dynamic program recommendations to the TV viewers. The system builds a vector-space based preference model for each user by combining the viewing patterns and the contents of the program that viewed by the users. Recommendation of future programs is done by selecting the best set of programs that matches the user’s viewing model.
	TO cite this article:Wenjing Fang, Zhiyuan Cai, Xiaodong Wang, et al. PredicTV: A Behavior-Oriented Real Time Recommender for TV Programs[OL].[20 December 2015] http://en.paper.edu.cn/en_releasepaper/content/4666173

Saved Papers

Saved Papers


	3. Article Errors Correction Based on Imbalanced Data Learning
	Chen Liangyu, Zhou Deyu
	Computer Science and Technology 24 December 2013
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:To correct the article usage error in English texts,this paper proposes a novel approach based on classification for article error correction.However, there is only small quantity of labeled data available,while large quantity of data are available withoutannotations. To fully employ both types of data, we usethe balance-cascade algorithm to overcome the imbalanced data problem.Experiments were conducted on the NUS Corpus of Learner English and experimental results showed that the proposed method can achieve high precision rate.
	TO cite this article:Chen Liangyu, Zhou Deyu. Article Errors Correction Based on Imbalanced Data Learning[OL].[24 December 2013] http://en.paper.edu.cn/en_releasepaper/content/4577940


	4. Named Entity Disambiguation with multiple features
	YANG Xue,TAN Yongmei
	Computer Science and Technology 09 November 2013
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:An algorithm based on multiple features was proposed to reduce the ambiguity of Named Entities. The proposed algorithm is divided into two parts: Entity Linking and Entity Clustering. The experimental results show that the proposed algorithm is effective.
	TO cite this article:YANG Xue,TAN Yongmei. Named Entity Disambiguation with multiple features[OL].[ 9 November 2013] http://en.paper.edu.cn/en_releasepaper/content/4568446


	5. HLDA BASED SENTENCE SCORING FOR MULTI-DOCUMENT SUMMARY
	LI Lei,YU Jia
	Computer Science and Technology 22 October 2013
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:In recent years, the multi-document summary technology has gotten more and more attention in the field of natural language processing. However, the relationship between the topics and the level information are rarely considered, and sentence scoring is also a very important and difficult task in the multi-document summary process. The results of hLDA (hierarchical Latent Dirichlet Allocation) in the hierarchical topic modeling have been widely validated. Therefore this paper focused on the nodes in the hLDA model, researched the hLDA and semantic based sentence scoring method and presented seven algorithms to provide a strong basis for the multi-document summary.
	TO cite this article:LI Lei,YU Jia. HLDA BASED SENTENCE SCORING FOR MULTI-DOCUMENT SUMMARY[OL].[22 October 2013] http://en.paper.edu.cn/en_releasepaper/content/4565453


	6. An Approach for Unstructured Temporal Query Processing
	Deng Hailong,Wang Xiaojie
	Computer Science and Technology 14 December 2012
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Time is an important aspect of data, and accessing data which meets the requirement of specified time is a common task in daily life. There have been a lot of research and solutions to structured temporal query (e.g. SQL, structured query in information retrieval system or grahical interface like calendar) or temporal expressions in docments, but there are few research focusing on unstructured temporal query processing. As the unstructured temporal query usually lacks of semantic information and is short, a novel approach has been proposed for unstructured temporal query processing in this paper. In this approach, serveral corresponding roles are defined for words in unstructured temporal query, and the query will be processed, and calculated into a structured temporal value based on these roles. Finally, we create a test data set by extracting temporal expressions from corpus and the experiment result on the data set shows good performance of our approach.
	TO cite this article:Deng Hailong,Wang Xiaojie. An Approach for Unstructured Temporal Query Processing[OL].[14 December 2012] http://en.paper.edu.cn/en_releasepaper/content/4502993


	7. Research on Fine-grained Text Similarity Detection for Research Papers via Rhetorical Structure Theory
	XU Fan,ZHU Qiaoming,LI Peifeng
	Computer Science and Technology 31 May 2012
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Text similarity detection is important in NLP. Yet only course-grained perspective has been investigated so far. To the best of our knowledge, this is the first paper proposes using fine-grained and discourse tree technology to detect similarity for research papers. Specifically, we present 2-stage text similarity detection framework. The first stage is that we automatically classify corresponding type of each sentence in texts using machine learning technology. In our 10-fold cross validation experiment, the accuracy and F1 measure is significantly improved. The second stage is that we create discourse tree for research papers using Rhetorical Structure Theory(RST). We employ "Failure tree" data structure to represent the final similarity coeffi-cient for the tree, and verify the effec-tiveness of it through experiment.
	TO cite this article:XU Fan,ZHU Qiaoming,LI Peifeng. Research on Fine-grained Text Similarity Detection for Research Papers via Rhetorical Structure Theory[OL].[31 May 2012] http://en.paper.edu.cn/en_releasepaper/content/4480267


	8. Sentiment Analysis of Complex Sentences for Chinese Document
	WU Xiaoyin
	Computer Science and Technology 03 January 2012
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Sentiment analysis as one of the most widely-studied sub-problems of opinion mining is paid more and more attention. At the document level of sentiment analysis, most of the existing methods ignore complex sentences which have particular sentence patterns, such as conditional sentence, transitional sentence and comparative sentence. This paper studies sentiment analysis of complex sentences. The aim of this paper is to determine the opinion expressed in a complex sentence is positive or negative. The approach has three steps: firstly, pre-processing the news documents; secondly, distinguishing complex sentences from simple ones; and at last, sentiment analyzing of the complex sentences extracted before based on a sentiment lexicon.
	TO cite this article:WU Xiaoyin. Sentiment Analysis of Complex Sentences for Chinese Document[OL].[ 3 January 2012] http://en.paper.edu.cn/en_releasepaper/content/4459480


	9. Location Query System Based On Google Map
	Sheng Yadong,Wang Xiaojie
	Computer Science and Technology 12 December 2011
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:With the popularity of GPS, Location-Based Service has been developed widely and applied in many fields such as location query service、point of interest search service、self-funded travel service and so on. To differentiate many similar locations, sentence similarity is introduced which is a very important research topic in the field of NLP, and has been widely used in the fields such as text classification, information processing and so on. In recent years, a great many methods have been proposed to measure the similarity of sentences, but these methods for computing sentence similarity have almost derived from approaches used for long text documents, they are not suitable for some applications. So this paper mainly focuses on very short sentence similarity computation, especially the similarity between Chinese and English addresses. In the process of computation, the sentence similarity is calculated with the information of both structure and semantic information. Experiments on the similarity calculation show that this proposed method has higher accuracy.
	TO cite this article:Sheng Yadong,Wang Xiaojie. Location Query System Based On Google Map[OL].[12 December 2011] http://en.paper.edu.cn/en_releasepaper/content/4455267


	10. A Bayesian Network for Automatic Term Recognition
	GUI Yaocheng,GAO Zhiqiang
	Computer Science and Technology 30 November 2011
	Show/Hide Abstract \| Cite this paper︱Full-text: PDF (0 B)

	Abstract:Terms with explicit meanings are used in the academic semantic search system to represent specific research domains.The major works of Automatic Term Recognition (ATR) focus on measuring the relationship between term and paper as the feature of term.The academic semantic search system does not provide full papers, and the short-text-corpus constructed by titles and abstracts of papers reduces the influence of the feature.This paper proposes a novel ATR approach.Firstly, new types of features are provided by measuring the relationships between term and other entities.Secondly, based on the relations between the features of term, the TRBN (term recognition bayesian network) model which is represented by Bayesian Network is proposed to integrate the features.The results of experiments, which are implemented on the corpus containing 7,750,000 titles and 4,500,000 abstracts from the domain of telecommunication and computer science, illustrate the good performance of this new approach that is 10 percent of precision outperforms the baseline method.
	TO cite this article:GUI Yaocheng,GAO Zhiqiang. A Bayesian Network for Automatic Term Recognition[OL].[30 November 2011] http://en.paper.edu.cn/en_releasepaper/content/4452995

	Check out RSS, or use RSS reader to subscribe this item