PhD Candidate in Nanjing University of Aeronautics and Astronautics, 2013-

MS in Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, 2011-2013

BS in Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, 2007-2011

Current Position

Graduate student in Nanjing University of Aeronautics and Astronautics

Research Area

Application of machine-learning in bioinformatics


lncRNA prediction

RNA-Seq based transcriptome assembly has been widely used in the identification of novel lncRNAs. However, it remains a challenge to identify lncRNAs from the assembled transcripts, particularly the partial-length ones, since partial-length protein-coding transcripts are more likely to be classified as lncRNAs due to their incomplete CDS. Furthermore, potential sequencing or assembly error that gain or abolish stop codons also complicates ORF-based prediction of lncRNAs. Here, we present a novel alignment-free tool, lncScore, which uses a logistic regression model with 11 carefully selected features derived from the open reading frame, exon, and the maximum coding subsequences. Compared to other state-of-the-art alignment-free tools (e.g. CPAT, CNCI, and PLEK), lncScore outperforms them on accurately identifying lncRNAs and mRNAs, especially on partial-length transcripts from the human and mouse datasets. Furthermore, lncScore also performed well on transcripts from five other species (Zebrafish, Fly, C. elegans, Rat, and Sheep), using models trained on human and mouse datasets. To speed up the prediction, multithreading is implemented within lncScore, and it only took 2 minute to classify 64,756 transcripts and 54 seconds to train a new model with 21,000 transcripts with 12 threads, which is much faster than other tools. lncScore is written in Python and can be accessed at




  1. Zhao J, Song X, Wang K. lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts. Scientific Reports, in press, 2016