2012- now Phd Candidate, Neuroscience Graduate Program, University of Southern California

2008-2012 B.S. Life Science in Peking University, Beijing, China

2007-2008 Computer Science and Electrical Engineering in Peking University, Beijing, China

Research Area

Gene Prioritization

Gene Prioritization is proritizing genes based on different properties of genes, usually from a large list of genes or genome-wde. My focus is mainly on prioritizing genes based on phenotypes or diseases. I've worked and developed an algorithm and a tool named Phenolyzer, to reason like a normal bioogist to expedite disease gene discovery. The logic is intuitive, first Phenolyzer interprets the users' each term into a whole bunch of specific diseases, then it tries to map the diseases into different genes and assgin normalized scores to each gene, finally it grows the seed genes into different gene-gene relation databases and integrate all the infomation to give each gene a rank.

The beautiful part of Phenolyzer is its data integraton model, which does diffrentiate different sources of information and train the model with logistic learninng method, unlike the traditional gene prioritization tools. Moreover, Phenolyzer can output every single detail about how the gene gets its scores, and automatically generates a network visualization of the top50 highest prioritzed genes and their interactions between each other. The network is totally interactive, which means the user could manipulate it, like double clicking a gene to only see the other genes related with it. 

CNV and SV detection 

CNV and SV stands for Copy Number Variation and Structural Variants, which are two types of genomic variants we still don't fully understand yet. Compared with SNPs (Single Nucleotide Polymorphysm), they usually include large number of base pairs and are hard to identify. However, they contribute to a lot of diseases including psychiatric diseases and cancer. And with the new algorithms and tools in new the sequencing era, the secrets of these variants will be unveiled. 

Neurogenetics and data mining

Utilizing bioinformatics methods to annalyze large scale data from Neurogenetics and Neuroscience is also my interest, like analysis of CNV data of schizophrenia, analysis of complex proteins functioning in neuro-development and so on.



  • Hui Yang, Peter N. Robinson, and Kai Wang. "Phenolyzer: phenotype-based prioritization of candidate genes for human diseases." Nature methods 12.9 (2015): 841-843.
  • Hui Yang, and Kai Wang. "Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR." Nature protocols 10.10 (2015): 1556-1566.
  • Liu, Kai, Lianggong Ding, Yuhong Li, Hui Yang, Chunyue Zhao, Ye Lei, Shuting Han et al. "Neuronal necrosis is regulated by a conserved chromatin-modifying cascade." Proceedings of the National Academy of Sciences 111, no. 38 (2014): 13960-13965.
  • Shi, Lingling, Yunfei Guo, Chengliang Dong, John Huddleston, Hui Yang, Xiaolu Han, Aisi Fu et al. "Long-read sequencing and de novo assembly of a Chinese genome." Nature Communications 7 (2016).
  • Ling, Chao, Lin Wang, Zheng Wang, Luming Xu, Lifang Sun, Hui Yang, Wei-Dong Li, and Kai Wang. "A pathway-centric survey of somatic mutations in Chinese patients with colorectal carcinomas." PloS one 10, no. 1 (2015): e0116753.