Ph.D. candidate in Biostatistics, USC

M.S. in Computer Science, USC, 2015-2016

B.S. in Biology, Central South University, 2009-2012

Undergraduate education in Information Technology and Sciences, 2008-2009


Research Area

MetaSVM and MetaLR: ensemble scoring methods for deleterious missense mutations

iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing cancer driver genes in personal genomes

Identification of driver genes for malignant meningioma 


Research Summary

MetaSVM and MetaLR

Accurate deleteriousness prediction for nonsynonymous variants is crucial for distinguishing pathogenic mutations from a large number of background polymorphisms in Whole Exome Sequencing (WES) studies. Although many deleteriousness prediction methods have been developed, their prediction results are sometimes inconsistent with each other and their relative merits are still unclear in practical applications. To address these issues, we comprehensively evaluated the predictive performance of eighteen current deleteriousness-scoring methods, including eleven function prediction scores (PolyPhen-2, SIFT, MutationTaster, Mutation Assessor, FATHMM, LRT, PANTHER, PhD-SNP, SNAP, SNPs&GO, and MutPred), three conservation scores (GERP++, SiPhy and PhyloP) and four ensemble scores (CADD, PON-P, KGGSeq and CONDEL) and developed two ensemble scores, MetaSVM and MetaLR, using Support Vector Machine (SVM) and Logistic Regression (LR) respectively to integrate nine deleteriousness prediction scores and maximum minor allele frequency for more accurate and comprehensive evaluation of deleteriousness of missense mutations. MetaSVM and MetaLR achieved the highest discriminative power compared to all eighteen existing deleteriousness prediction scores, which demonstrated the value of combining information from multiple orthologous approaches. Finally, we made publicly available the whole-exome prediction scores both from our ensemble methods and from all other deleteriousness prediction tools through the ANNOVAR software and the dbNSFP database, hoping to facilitate variant prioritization in WES studies. 

metasvm roc



Cancer is caused by accumulation of somatic driver mutations harbored in driver genes. Despite the establishment of various individual computational methods for pinpointing these driver events, joining hands in knowledge accumulation of cancer biology networks and principles in increasing global endeavor, robust computational tools exploiting such prior knowledge to aid driver genes discovery are still underdeveloped. To address this issue, we devised integrated CAncer GEnome Score (iCAGES), a statistical framework to prioritizes potential cancer driver genes in virtue of their biological prior association and its ensemble cancer-driving potential. iCAGES is implemented with radial Support Vector Machine (SVM) trained on somatic non-synonymous Single Nucleotide Variations (nsSNVs) from COSMIC and Uniprot databases, followed by a two-step ranking process to employ related biological prior knowledge, downstream gene-gene interaction information and genomic complexity of the cancer driver event. Using individual protein-altering point mutations, iCAGES demonstrates its accuracy in prioritizing cancer driver genes in three distinct scenarios. In summary, iCAGES computationally leverages personal genomic driver event, with the aid of prior biological knowledge, shedding light into cancer driver genes identification, personalized drug discovery and cancer treatment. Command line iCAGES can be downloaded at and a user-friendly interface of iCAGES can be accessed at


 icages correlation

Meningioma driver gene identification

Meningiomas are tumors originating from the membranous layers surrounding the central nervous system, and are generally regarded as “benign” tumors of the brain. Rare as they are, the malignant meningiomas are typically associated with a higher risk of local tumor recurrence and a poorer prognosis (median survival time <2 years). Previous genome-wide association studies and exome sequencing studies have identified genes that play a role in susceptibility or progression in meningiomas, but these studies rarely focus on malignant tumors. In our current study, we combined exome sequencing, targeted region sequencing and Sanger sequencing, trying to identify the underlying driver genes responsible for this rare and deadly cancer subtype.




Dong C, Guo Y, Yang H, He Z, Liu X, Wang K: iCAGES: integrated Cancer Genome Score for comprehensively prioritizing cancer driver genes in personal genoms. Genome Medicine 8:135, 2016. 

Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, Liu X: Comparison and integration of deleteriousness prediction methods of nonsynonymous SNPs in whole exome sequencing studies. Human Molecular Genetics 24(8): 2125-2137, 2015. 

Qi H., Dong C., Chung W., Wang K., Shen Y.: Deep genetic connection between cancer and developmental disorders. Human Mutation 37:1042-1050, 2016

Shi L., Guo Y., Dong C., Huddleston J., Yang H., Han X., Fu A., Li N., Gong S., Lintner K.E., Ding Q., Wang Z., Hu J., Wang D., Wang F., Wang L., Lyon G., Guan Y., Shen Y., Evgrafov O., Knowles J., Yu C.Y., Zhou L., Eichler E.E., So K., Wang K.: Long-read sequencing and de novo assembly of a Chinese genome. Nature Communications 7:12056, 2016

Li Z., Huang Y., Li H., Hu J., Liu X., Jiang T., Sun G., Tang A., Sun X., Qian W., Zeng Y., Xie J., Zhao W., Xu Y., He T., Dong C., Liu Q., Mou L., Lu J., Lin Z., Wu S., Gao S., Guo G., Feng Q., Li Y., Zhang X., Wang J., Yang H., Wang J., Xiong C., Cai Z., Gui Y.: Excess of rare variants in genes that are key epigenetic regulators of spermatogenesis in the patients with non-obstructive azoospermia. Scientific reports 5:8785, 2015.

Zhang X, Jia H, Lu Y, Dong C, Hou J, Wang F, Zhong H, Wang L, Wang K: Exome sequencing on malignant meningiomas identified mutations in NF2 and MN1. Discovery Medicine 18(101): 301-311, 2014.