A clear identification of the primary site of tumor is of great importance to the next targeted site-specific treatments and could efficiently improve patient's overall survival. Even though many classifiers based on gene expression had been proposed to predict the tumor primary, only a few studies focus on using DNA methylation profiles to develop classifiers, and none of them compares the performance of classifiers based on different profiles.
We introduced novel selection strategies to identify highly tissue-specific CpG sites and then used the random forest approach to construct the classifiers to predict the origin of tumors. We also compared the prediction performance by applying similar strategy on miRNA expression profiles. Our analysis indicated that these classifiers had an accuracy of 96.05% (Maximum-Relevance-Maximum-Distance: 90.02%-99.99%) or 95.31% (Principal component analysis: 79.82%-99.91%) on independent DNA methylation data sets, and an overall accuracy of 91.30% (range: 79.33%-98.74%) on independent miRNA test sets for predicting tumor origin. This suggests that our feature selection methods are very effective to identify tissue-specific biomarkers and the classifiers we developed can efficiently predict the origin of tumors. We also developed a user-friendly webserver that helps users to predict the tumor origin by uploading miRNA expression or DNA methylation profile of their interests.
The webserver, and relative data, code are accessible at http://server.malab.cn/MMCOP/.
Supplementary data are available at Bioinformatics online.