Ph.D., Nanyang Technological University, Singapore, 2008-2012

M.S., University of Macau, Macau, 2004-2007

B.S., Nankai University, Tianjin, China, 2000-2004


Research Area

Hidden Markov Models to analyze repeat elements in genome

Development of novel algorithms for long-read sequencing data


Research Summary


RepeatHMM is a novel computational tool to detect trinucleotide repeats and trinucleotide repeat disorders (TRD) from given long reads for a subject of interests. It is able to accurately estimate estimate expansion counts according to the evaluation performance on both simulation data and real data. It is user friendly and easy to install and use.


RepeatHMM takes long reads from a subject as input, and uses a novel unsymmetrical sequence alignment (UnsymSeqAlg) to map all reads to a specific gene of interest in a reference genome (hg38 here), and then, employs UnsymSeqAlg for optional error correction of repeat regions. After that, It uses a hidden Markov model (HMM) method to estimate the repeat count for each of long reads with higher coverage. Lastly, it will detect one or two peaks of expansion counts for the subject of interest.


RepeatHMM is evaluated on simulation data with in silico produced repeat counts and their simulation long reads, and also a real dataset of the ATXN3 gene for SCA3. The results demonstrate that our tool is able to accurately estimate expansion counts from long reads.





Journal papers
Conference papers
