|
In this paper , we proposed a machine learning method based on the Na?ve Bayes classifier for predicting piRNA. First, piRNA and non-piRNA sequences of five model species: human, rat, mouse, fruit fly and nematode are served as training set. Then, sequence features, including k-mer frequencies, standardized word frequencies under a K-2 order Markov Model and different functions of four nucleotides are extracted from each sequence. Finally, the integrated features were fed into the Na?ve Bayes classifier to perform the prediction, where conditional probability of a word in each class was estimated by a histogram technique. Our machine learning approach achieved the overall accuracy of 82% by 5-fold cross validation. Due to the conciseness of the probability model, Na?ve Bayes classifier can be trained and predicted very fast and was efficient in large datasets. |
|
Keywords:Na?ve Bayes classifier; extracting feature; cross validation. |
|