We broadcast research in GitHub, scientific journals and web servers.

We developed a number of software tools for genomic data analysis. Check them out at WGLab GitHub.

  • The Google Citation report for the PI can be accessed here (Total citation: >55,000; H-index: 80). In 2022, the citation for the ANNOVAR (2010) paper reached 10,000, and InterVar (2017) reached 600.
  1. Zhang Y, Ahsan MU, Wang K. Noncoding de novo mutations in SCN2A are associated with autism spectrum disorders. medRxiv, doi: https://doi.org/10.1101/2024.05.05.24306908
  2. Kim J, Yang J, Wang K, Weng C, Liu C. Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis. arXiv, arXiv:2403.14801 [q-bio.QM]
  3. Xu Z, Qu HQ, Kao C, Hakonarson H, Wang K. Single-Cell Omics for Transcriptome CHaracterization (SCOTCH): isoform-level characterization of gene expression through long-read single-cell RNA sequencing. bioRxiv, doi: https://doi.org/10.1101/2024.04.29.590597
  4. Wu D, Yang J, Liu C, Hsieh TC, Marchi E, Blair J, Krawitz P, Weng C, Chung W, Lyon GJ, Krantz ID, Kalish JM, Wang K. Multimodal Machine Learning Combining Facial Images and Clinical Texts Improves Diagnosis of Rare Genetic Diseases. arXiv, arXiv:2312.15320 [q-bio.QM]
  5. Chen F, Ahimaz P, Wang K, Chung WK, Ta C, Weng C, Liu C. Phenotype-Driven Molecular Genetic Test Recommendation for Diagnosing Pediatric Rare Disorders. Res Sq, doi: 10.21203/rs.3.rs-3593490/v1.
  6. Wu D, Yang J, Wang K. Not All Large Language Models (LLMs) Succumb to the "Reversal Curse": A Comparative Study of Deductive Logical Reasoning in BERT and GPT Models. arXiv, arXiv:2312.03633 [cs.CL]
  7. Fang L, Chen Q, Wei CH, Lu Z, Wang K. Bioformer: an efficient transformer language model for biomedical text mining. arXiv, arXiv:2302.01588 [cs.CL]
  1. Lai W, Zhao Y, Chen Y, et al. Autism patient-derived SHANK2B Y29X mutation affects the development of ALDH1A1 negative dopamine neuron. Molecular Psychiatry, doi: 10.1038/s41380-024-02578-6, 2024
  2. Gracia-Diaz C, Perdomo JE, Khan ME, Roule T, Disanza BL, Cajka GG, Lei S, Gagne AL, Maguire JA, Shalem O, Bhoj EJ, Ahrens-Nicklas RC, French DL, Goldberg EM, Wang K, Glessner JT, Akizu N. KOLF2.1J iPSCs carry CNVs associated with neurodevelopmental disorders. Cell Stem Cell, 31(3):288-289, 2024
  3. Ahsan MU, Gouru A, Chan J, Zhou W, Wang K. A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing. Nature Communications, 15(1):1448, 2024
  4. Gargano MA, Matentzoglu N, Coleman B, Addo-Lartey EB, Anagnostopoulos AV, et al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Research, 52(D1):D1333-D1346, 2024
  5. Murali H, Wang P, Liao EC, Wang K. Genetic variant classification by predicted protein structure: A case study on IRF6. Computational and Structural Biotechnology, 23:892-904, 2024
  6. Rybacki K, Xia M, Ahsan MU, Xing J*, Wang K*. Assessing the Expression of Long INterspersed Elements (LINEs) via Long-Read Sequencing in Diverse Human Tissues and Cell Lines. Genes, 14(10):1893, 2023
  7. Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nature Communications, 14(1):7805, 2023
  8. Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. Patterns, 5(1):100887, 2023
  9. Jiang T, Fang L, Wang K. Deciphering the Language of Nature: A transformer-based language model for deleterious mutations in proteins. Innovation, 4:100487, 2023
  10. Wu D, Yang J, Ahsan MU, Wang K. Classification of integers based on residue classes via modern deep learning algorithms. Patterns, 4(12):100860, 2023
  11. Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nature Methods, 20:1143–1158, 2023
  12. Wang X, Ahsan MU, Zhou Y, Wang K. Transformer-based DNA methylation detection on ionic signals from Oxford Nanopore sequencing data. Quantitative Biology, 11(3):287-296, 2023
  13. Lyon GJ, Vedaie M, Besheim T, Park A, Marchi E, et al. Expanding the Phenotypic spectrum of Ogden syndrome (NAA10-related neurodevelopmental syndrome) and NAA15-related neurodevelopmental syndrome. European Journal of Human Genetics, 31:824–833, 2023
  14. Fang L#, Mas Monteys A#, Dürr A, Keiser M, Cheng C, Harapanahalli A, Gonzalez-Alegre P, Davidson BL*, Wang K*. Haplotyping SNPs for allele-specific gene editing of the expanded huntingtin allele using long-read sequencing. HGG Advances, 4:100146, 2023
  15. Ren Z, Li Q, Cao K, Li MM, Zhou Y, Wang K. Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data. BMC Bioinformatics, 2023
  16. Li MM, Cottrell CE, Pullambhatla M, Roy S, Temple-Smolkin RL, Turner SA, Wang K, Zhou Y, Vnencak-Jones CL. Assessments of Somatic Variant Classification Using the Association for Molecular Pathology/American Society of Clinical Oncology/College of American Pathologists Guidelines: A Report from the Association for Molecular Pathology. Journal of Molecular Diagnostics, doi: 10.1016/j.jmoldx.2022.11.002, 2022
  17. Scott SA, Wang K, Spinner NB. Human Mutation special issue on innovations in genomic diagnostics. Human Mutation, 43(11):1493-1494, 2022
  18. Nixon A, Fang L, Havrilla JM, Wang K. Termviewer - A Web Application for Streamlined Human Phenotype Ontology (HPO) Tagging and Document Annotation. Chemistry and Biodiversity, 19:e202200805, 2022
  19. Li C, Zhi D, Wang K, Liu X. MetaRNN: Differentiating Rare Pathogenic and Rare Benign Missense SNVs and InDels Using Deep Learning. Genome Medicine, 14:115, 2022
  20. Chen Q, Allot A, Leaman R, Doğan RI, Du J, et al. Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database, 2022:baac069, 2022
  21. Doostparast Torshizi A, Wang K. Tissue-wide cell-specific proteogenomic modeling reveals novel candidate risk genes in autism spectrum disorders. npj Systems Biology and Applications, 8:31, 2022
  22. Liu C, Ta CN, Havrilla JM, Nestor JG, Spotnitz ME, Geneslaw AS, Hu Y, Chung WK, Wang K, Weng C. OARD: Open annotations for rare diseases and their phenotypes based on real-world data. American Journal of Human Genetics, 109:1591-1604, 2022
  23. Guo L, Park J, Yi E, Marchi E, Hsieh TC, Kibalnyk Y, Moreno-Sáez Y, Biskup S, Puk O, Beger C, Li Q, Wang K, Voronova A, Krawitz PM, Lyon GJ. KBG syndrome: videoconferencing and use of artificial intelligence driven facial phenotyping in 25 new patients. European Journal of Human Genetics, 30:1244–1254, 2022
  24. Havrilla JM, Singaravelu A, Driscoll DM, Minkovsky L, Helbig I, Medne L, Wang K, Krantz I, Desai BR. PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care, BMC Med Inform Decis Mak, 22:198, 2022
  25. Fang L#, Liu Q#, Monteys AM, Gonzalez-Alegre P, Davidson BL, Wang K. DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biology, 23:128, 2022
  26. Fang L, Wang K. Polishing high-quality genome assemblies. Nature Methods, doi:10.1038/s41592-022-01515-1, 2022
  27. Olson ND, Wagner J, McDaniel J, et al. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. Cell Genomics, 2:100129, 2022.
  28. Zhao M, Havrilla J, Peng J, Drye M, Fecher M, Whitney Guthrie W, Tunc B, Schultz R, Wang K, Zhou Y. Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records. Journal of Neurodevelopmental Disorders, 14:32, 2022
  29. Wedemeyer MA, Muskens I, Strickland BA, Aurelio O, Martirosian V, Wiemels JL, Weisenberger DJ, Wang K, Mukerjee D, Rhie SK, Zada G. Epigenetic dysregulation in meningiomas. Neuro-Oncology Advances, 4:vdac084, 2022
  30. Peng J, Xu D, Lee R, Xu S, Zhou Y, Wang K. Expediting knowledge acquisition by a web framework for Knowledge Graph Exploration and Visualization (KGEV): case studies on COVID-19 and Human Phenotype Ontology. BMC Medical Informatics and Decision Making, 22:147, 2022
  31. Li Q, Ren Z, Cao K, Li MM, Wang K*, Zhou Y*. CancerVar: An artificial intelligence–empowered platform for clinical interpretation of somatic mutations in cancer. Science Advances, 8(18), 2022.
  32. Ahsan U, Liu Q, Fang L, Wang K. NanoCaller for accurate detection of SNPs and small indels from long-read sequencing by deep neural networks. Genome Biology, 22(1):261, 2021.
  33. Havrilla J, Zhao M, Liu C, Weng C, Helbig I, Bhoj E, Wang K. Clinical Phenotypic Spectrum of 4095 Individuals with Down Syndrome from Text Mining of Electronic Health Records. Genes, 12(8):1159, 2021.
  34. Hu Y, Fang L, Chen X, Zhong JF, Li M, Wang K. LIQA: Long-read Isoform Quantification and Analysis. Genome Biology, 22: 182, 2021.
  35. Havrilla J, Liu C, Dong X, Weng C, Wang K. PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Medicine, 13(1):91, 2021.
  36. Doostparast Torshizi A, Duan J, Wang K. A computational tool for direct inference of cell-specific expression profiles and cellular composition from bulk-tissue RNA-seq in brain disorders. NAR Genomics and Bioinformatics, 3(2):lqab056, 2021.
  37. Chen C, Yu W, Alikarami F, et al. Single-cell multiomics reveals increased plasticity, resistant populations and stem-cell-like blasts in KMT2A-rearranged leukemia. Blood, 2021.
  38. Ding X, Guo Y, Ye J, Wu X, Lin S, Chen F, Zhu L, Huang L, Song X, Zhang Y, Dai L, Xi X, Huang J, Wang K, Fan B, Li DW. Population differentiation and epidemic tracking of Bursaphelenchus xylophilus in China based on chromosome-level assembly and whole-genome sequencing data. Pest Management Science, doi:10.1002/ps.6738, 2021
  39. Huang H, Fang L, Liu Q, Doostparast Torshizi A, Wang K. Integrated analysis on transcriptome and behaviors defines HTT repeat-dependent network modules in Huntington's disease. Genes & Diseases, 2021.
  40. Doostparast Torshizi A, Duan J, Wang K. Cell type-specific proteogenomic signal diffusion for integrating multi-omics data predicts novel schizophrenia risk genes. Patterns, 1:100091, 2020.
  41. Liu Q, Hu Y, Stucky A, Fang L, Zhong JF, Wang K. LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing, BMC Genomics, 21:793, 2020.
  42. Hu Y, Fang L, Nicholson C, Wang K. Implications of error-prone long-read whole-genome shotgun sequencing on characterizing reference microbiomes. iScience, 23:101223, 2020.
  43. Doostparast Torshizi A, Ionita-Laza I, Wang K. Cell Type-Specific Annotation and Fine Mapping of Variants Associated With Brain Disorders. Frontiers in Genetics, 11: 575928, 2020
  44. Yang H, Luo Y, Liu T, et al. A map of cis-regulatory elements and 3D genome structures in zebrafish. Nature, 588:337–343, 2020.
  45. Huang D, Yi X, Zhou Y, Yao H, Xu H, Wang J, Zhang S, Nong W, Wang P, Shi L, Xuan C, Li M, Wang J, Li W, Kwan HS, Sham PC, Wang K, Li MJ. Ultrafast and scalable variant annotation and prioritization with big functional genomics data. Genome Research, 30:1789-1801, 2020.
  46. Zhao M, Havrilla JM, Fang L, Chen Y, Peng J, Liu C, Wu C, Sarmady M, Botas P, Isla J, Lyon GJ, Weng C*, Wang K*. Phen2Gene: Rapid Phenotype-Driven Gene Prioritization for Rare Diseases. NAR Genomics and Bioinformatics, 2:lqaa032, 2020.
  47. Georgieva D, Liu Q, Wang K*, Egli D*. Detection of Base Analogs Incorporated During DNA Replication by Nanopore Sequencing. Nucleic Acids Research, 48:e88, 2020
  48. Hu Y, Wang K, Li M. Detecting differential alternative splicing events in scRNA-seq with or without Unique Molecular Identifiers. PLoS Computational Biology, 16:e1007925, 2020
  49. Liu Q, Tong Y, Wang K. Genome-wide detection of short tandem repeat expansions by long-read sequencing. BMC Bioinformatics, 21:542, 2020.
  50. Evgrafov OV, Armoskus C, Wrobel BB, Spitsyna VN, Souaiaia T, Herstein JS, Walker CP, Nguyen JD, Camarena A, Weitz JR, Kim JM, Duarte EL, Wang K, Simpson GM, Sobell JL, Medeiros H, Pato MT, Pato CN, Knowles JA: Gene Expression in Patient-Derived Neural Progenitors Implicates WNT5A Signaling in the Etiology of Schizophrenia. Biological Psychiatry, 88:236-247, 2020
  51. Wang L, Wang Q, Bai H, Liu C, Liu W, Zhang Y, Jiang L, Xu H, Wang K*, Zhou Y*. EHR2Vec: Representation Learning of Medical Concepts From Temporal Patterns of Clinical Notes Based on Self-Attention Mechanism, Frontiers in Genetics, 11:630, 2020.
  52. Peng J, Zhao M, Havrilla J, Liu C, Weng C, Guthrie W, Schultz R, Wang K*, Zhou Y*. Natural Language Processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder. BMC Medical Informatics and Decision Making, 20:322, 2020
  53. Ling C, Dai Y, Fang Li, Yao F, Liu Z, Qiu Z, Cui L, Xia F, Zhao C, Zhang S, Wang K*, Zhang X*. Exonic rearrangements in DMD in Chinese Han individuals affected with Duchenne and Becker muscular dystrophies. Human Mutation, 41:668-677, 2020.
  54. Wu J, Li Y, Wang C, Cui Y, Xu T, Wang C, Wang X, Sha J, Jiang B, Wang K, Hu Z, Guo X, Song X. CircAST: Full-length Assembly and Quantification of Alternatively Spliced CircRNA Isoforms. Genomics Proteomics Bioinformatics, 17(5): 522-534, 2020
  55. Dai Y, Li P, Wang Z, Liang F, Yang F, Fang L, Huang Y, Huang S, Zhou J, Wang D, Cui L, Wang K: Single-molecule optical mapping enables quantitative measurement of D4Z4 repeats in facioscapulohumeral muscular dystrophy (FSHD). Journal of Medical Genetics, 57:109-120, 2020.
  56. Fang L, Kao C, Gonzalez MV, Mafra FA, Pellegrino da Silva R, Li M, Wenzel S, Wimmer K, Hakonarson H, Wang K. LinkedSV: Detection of mosaic structural variants from linked-read exome and genome sequencing data. Nature Communications, 10:5585, 2019.
  57. Doostparast Torshizi A, Armoskus C, Zhang H, Forrest MP, Zhang S, Souaiaia T, Evgrafov OV, Knowles JA, Duan J*, Wang K*: Deconvolution of Transcriptional Networks Identified TCF4 as a Master Regulator in Schizophrenia. Science Advances, 5:eaau4139, 2019.
  58. Liu C, Peres Kury FS, Li Z, Ta C, Wang K*, Weng C*. Doc2Hpo: a web application for efficient and accurate HPO concept curation. Nucleic Acids Research, 47:W566-W570, 2019
  59. He MM, Li Q, Yan M, Cao H, Hu Y, He KY, Cao K, Li MM, Wang K. Variant Interpretation for Cancer (VIC): a computational tool for assessing clinical impacts of somatic variants. Genome Medicine, 11:53, 2019
  60. Liu Q, Fang L, Yu G, Wang D, Xiao CL*, Wang K*. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nature Communications, 10:2449, 2019
  61. Xie G, Dong C, Kong Y, Zhong JF, Li M, Wang K. GDP: Group lasso regularized Deep learning for cancer Prognosis from multi-omics and clinical features. Genes, 10:240, 2019.
  62. Khan A, Liu Q, Chen X, Zeng Y, Stucky A, Sedghizadeh PP, Adelpour D, Zhang X, Wang K*, Zhong JF*: Detection of human papillomavirus in cases of head and neck squamous cell carcinoma by RNA-seq and VirTect. Molecular Oncology, 13:829-839, 2019
  63. Zeng S, Zhang MY, Wang XJ, Hu ZM, Li JC, Li N, Wang JL, Liang F, Yang Q, Liu Q, Fang L, Hao JW, Shi FD, Ding XB, Teng JF, Yin XM, Jiang H, Liao WP, Liu JY, Wang K*, Xia K*, Tang BS*: Long-read sequencing identified intronic repeat expansions in SAMD12 from Chinese pedigrees affected with Familial Cortical Myoclonic Tremor with Epilepsy. Journal of Medical Genetics, 56:265-270, 2019
  64. Borgmann-Winter KE, Wang K, Bandyopadhyay S, Doostparast Torshizi A, Blair I, Hahn CY. The proteome and its dynamics: A missing piece for integrative multi-omics in schizophrenia. Schizophrenia Research, doi: 10.1016/j.schres.2019.07.025, 2019
  65. Paine I, Posey JE, Grochowski CM, Jhangiani SN, Rosenheck S, et al. Paralog Studies Augment Gene Discovery: DDX and DHX Genes. American Journal of Human Genetics, 105(2):302-316, 2019
  66. Lyon GJ, Marchi E, Ekstein J, Meiner V, Hirsch Y, Scher S, Yang E, De Vivo DC, Madrid R, Li Q, Wang K, Haworth A, Chilton I, Chung WK, Velinov M. VAC14 syndrome in two siblings with retinitis pigmentosa and neurodegeneration with brain iron accumulation. Cold Spring Harbor Molecular Case Studies, 5(6):a003715, 2019
  67. Liu Q, Shi L, Wang K. Ethnicity-Specific Reference Genome Assembly by Long-Read Sequencing. J Mol Genet Med, 12:385, 2018
  68. Liu Q, Georgieva DC, Egli DM, Wang K. NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genomics, 20:78, 2018
  69. Chen Y, Millstein J, Liu Y, Chen GY, Chen X, Stucky A, Qu C, Fan JB, Chang X, Soleimany A, Wang K, Zhong J, Liu J, Gilliland FD, Li Z, Zhang X, Zhong JF. Single-Cell Digital Lysates Generated by Phase-Switch Microfluidic Device Reveal Transcriptome Perturbation of Cell Cycle. ACS Nano, 12:4687-4694, 2018
  70. Khan A, Liu Q, Wang K. iMEGES: integrated Mental-disorder GEnome score for prioritizing the susceptibility genes for mental disorders in personal genomes. BMC Bioinformatics, 19:501, 2018
  71. He Z, Liu L, Wang K, Ionita-Laza I. A semi-supervised approach for predicting cell type specific functional consequences of non-coding variation using MPRAs. Nature Communications, 9:5199, 2018
  72. Xiao CL, Zhu S, He M-H, Chen Y, Yu GL, Chen D, Xie SQ, Luo F, Liang Z, Wang DP, Bo XC*, Gu XF*, Wang K*, Yan GR*. N6-methyladenine DNA modification in human genome. Molecular Cell, 71:306-318, 2018
  73. Hoon Son J, Xie G, Yuan C, Ena L, Li Z, Goldstein A, Huang L, Wang L, Shen F, Liu H, Mehl K, Groopman EE, Marasa M, Kiryluk K, Gharavi AG, Chung WK, Hripcsak G, Friedman C, Weng C*, Wang K*. Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes. American Journal of Human Genetics, 103:58-73, 2018
  74. Doostparast Torshizi A, Wang K. Next-generation sequencing in drug development: target identification and genetically stratified clinical trials. Drug Discovery Today, 23:1776-1783, 2018
  75. Fang L, Hu J, Wang D, Wang K. NextSV: a computational pipeline for structural variation analysis from low-coverage long-read sequencing. BMC Bioinformatics, 19:180, 2018
  76. Miao H, Zhou J, Yang Q, Liang F, Wang D, Ma N, Gao B, Du J, Lin G, Wang K*, Zhang Q*. Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis. Heraditas, 115:32, 2018
  77. Khan A, Wang K. A deep learning based scoring system for prioritizing susceptibility variants for mental disorders. IEEE International Conference on Bioinformatics and Biomedicine. Page: 1698-1705, DOI: 10.1109/BIBM.2017.8217916, 2017
  78. Li Q, Wang K. InterVar: Clinical interpretation of genetic variants by ACMG/AMP 2015 guidelines, American Journal of Human Genetics, 100:267-280, 2017
  79. Liu Q, Zhang P, Wang D, Gu W, Wang K. Interrogating the "unsequenceable" genomic trinucleotide repeat disorders by long-read sequencing. Genome Medicine, 9:65, 2017
  80. Li J, Zhang W, Yang H, Howrigan DP, Wilkinson B, Souaiaia T, Evgrafov OV, Genovese G, Clementel VA, Tudor JC, Abel T, Knowles JA, Neale BM, Wang K, Sun F, Coba MP: Spatiotemporal profile of postsynaptic interactomes integrates components of complex brain disorders. Nature Neuroscience, 20:1150-1161, 2017
  81. de Araújo Lima LA, Wang K: PennCNV in whole-genome sequencing data. BMC Bioinformatics, 18:383, 2017
  82. Dong C, Guo Y, Yang H, He Z, Liu X, Wang K. iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing driver genes in personal cancer genomes, Genome Medicine, 8:135, 2016
  83. Shi L, Guo Y, Dong C, Huddleston J, Yang H, Han X, Fu A, Li Q, Li N, Gong S, Lintner KE, Ding Q, Wang Z, Hu J, Wang D, Wang F, Wang L, Lyon GJ, Guan Y, Shen Y, Evgrafov OV, Knowles JA, Thibaud-Nissen F, Schneider V, Yu CY, Zhou L, Eichler EE, So KF, Wang K. Long read sequencing and de novo assembly of a Chinese genome. Nature Communications, 7:12065, 2016
  84. Cai M, Gao F, Lu W, Wang K. w4CSeq: software and web application to analyze 4C-Seq data, Bioinformatics, 32:3333-3335, 2016
  85. Song X, Zhang N, Han P, Lai RK*, Wang K*, Lu W*. Circular RNA Profile in Gliomas Revealed by Identification Tool UROBORUS. Nucleic Acids Research, 44:e87, 2016
  86. He KY, Zhao Y, McPherson EW, Li Q, Xia F, Weng C, Wang K*, He MM*. Pathogenic Mutations in Cancer-Predisposing Genes: A Survey of 300 Patients with Whole-Genome Sequencing and Lifetime Electronic Health Records. PLoS One, 11:e0167847, 2016
  87. Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ, et al. The PsychENCODE Project, Nature Neuroscience, 18:1707-1712, 2015
  88. Guo Y, Ding X, Shen Y, Lyon GJ, Wang K. SeqMule: automated analysis pipeline for analysis of human exome/genome sequencing data. Scientific Reports, 5:14283, 2015
  89. Yang H, Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nature Protocols, 10:1556-1566, 2015
  90. Yang H., Robinson PN, Wang K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nature Methods, 12:841-843, 2015
  91. He M, Person TN, Hebbring SJ, Heinzen E, Ye Z, Schrodi SJ, McPherson EW, Lin SM, Peissig PL, Brilliant MH, O'Rawe J, Robison RJ, Lyon GJ, Wang K. SeqHBase: a big data toolset for family-based sequencing data analysis. Journal of Medical Genetics, 52:282-288, 2015
  92. Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K*, Liu X*. Comparison and integration of deleteriousness prediction methods of nonsynonymous SNPs in whole exome sequencing studies. Human Molecular Genetics, 24:2125-2137, 2015
  93. Guo Y, Conti DV, Wang K. Enlight: web-based integration of GWAS results with biological annotations. Bioinformatics, 31:275-276, 2015
  94. Gao F, Wang K. Ligation-anchored PCR unveils immune repertoire of TCR-beta from whole blood. BMC Biotechnology, 15:39, 2015
  95. Jia H, Guo Y, Zhao W, Wang K. Long-range PCR in next-generation sequencing: comparison of six enzymes and evaluation on the MiSeq sequencer. Scientific Reports, 4:5737, 2014
  96. Shi L, Li B, Huang YL, Ling XY, Liu T, Lyon GJ, Xu A, Wang K. "Genotype-first" approaches on a curious case of idiopathic progressive cognitive decline. BMC Medical Genomics, 7:66, 2014
  97. Gao F, Wei Z, Lu W, Wang K. Comparative analysis of 4C-Seq data generated from enzyme-based and sonication-based methods. BMC Genomics, 14:345, 2013
  98. Wei Z, Gao F, Kim S, Yang H, Wang K, Lu W. Klf4 Organizes Long-Range Chromosomal Interactions with the Oct4 Locus in Reprogramming and Pluripotency. Cell Stem Cell, 13:36-47, 2013
  99. Chen G, Chang X, Curtis C, Wang K. Precise inference of copy number alterations from SNP arrays. Bioinformatics, 29:2964-2970, 2013
  100. Wang K*, Kim C, Bradfield J, Guo Y, Toskala E, Otieno FG, Hou C, Thomas K, Cardinale C, Lyon GL, Golhar R, Hakonarson H*. Whole-genome DNA/RNA sequencing on a novel Mendelian disease with neuromuscular and cardiac involvement. Genome Medicine, 5:67, 2013
  101. Shi L, Chang X, Zhang P, Coba M, Lu W, Wang K. The functional genetic link of NLGN4X knockdown and neurodevelopment in neural stem cells. Human Molecular Genetics, 22:3749:3760, 2013
  102. Gao F, Ling C, Shi L, Commins D, Zada G, Mack W, Wang K. Inversion-mediated gene fusion involving NAB2-STAT6 in an unusual malignant meningioma. British Journal of Cancer, 109:1051-1055, 2013
  103. Gao F, Shi L, Russin J, Zeng L, Chang X, He S, Chen TC, Giannotta SL, Weisenberger DJ, Zada G, Mack WJ, Wang K. DNA methylation in the malignant transformation of meningiomas. PLoS ONE, 8:e54114, 2013
  104. Chang X, Xu T, Li Y, Wang K. Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of 'date' and 'party' hubs. Scientific Reports, 3:1691, 2013
  105. Qiu S, Luo S, Evgrafov O, Li R, Schroth GP, Levitt P, Knowles JA*, Wang K*. Single-neuron RNA-Seq: technical feasibility and reproducibility. Frontiers in Genetics, 3:124, 2012
  106. Lyon GJ, Wang K. Identifying disease mutations in genomic medicine settings: current challenges and how to accelerate progress. Genome Medicine, 4:58, 2012
  107. Chang X, Wang K. wANNOVAR: annotating genetic variants for personal genomes via the web. Journal of Medical Genetics. 49:433-436, 2012
  108. Lyon GJ, Jiang T, Van Wijk R, Wang W, Bodily P, Xing J, Tian L, Robison R, Clement M, Yang L, Zhang P, Liu Y, Moore B, Glessner J, Elia J, Reimherr F, van Solinge W, Yandell M, Hakonarson H, Wang J, Johnson WE, Wei Z, Wang K. Exome Sequencing and Unrelated Findings in the context of Complex Disease Research: Ethical and Clinical Implications. Discovery Medicine, 12:41-55, 2011
  109. Wang K*, Diskin SJ*, Zhang H*, Attiyeh EF, Winter C, Hou C, Schnepp RW, Diamond M, Bosse K, Mayes PA, Glessner J, Kim C, Frackelton E, Garris M, Wang Q, Glaberson W, Chiavacci R, Nguyen L, Jagannathan J, Saeki N, Sasaki H, Grant SF, Iolascon A, Mosse YP, Cole KA, Li H, Devoto M, McGrady PW, London WB, Capasso M, Rahman N, Hakonarson H, Maris JM. Integrative genomics identifies LMO1 as a neuroblastoma oncogene. Nature, 469:216-220, 2011
  110. Wang K, Zhang H, Bloss CT, Duvvuri V, Kaye W, Schork NJ, Berrettini W, Hakonarson H, the Price Foundation Collaborative Group. A genome-wide association study on common SNPs and rare CNVs in anorexia nervosa. Molecular Psychiatry, 16:949-959, 2011
  111. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 38:e164 2010
  112. Wang K, Li M, Hakonarson H. Analysing biological pathways from genome-wide association studies. Nature Reviews Genetics, 11:843-854, 2010
  113. Wang K, Bucan M, Grant SF, Schellenberg G, Hakonarson H. Strategies for genetic studies of complex diseases. Cell, 142:351-353, 2010
  114. Wang K, Baldassano R, Zhang H, et al. Comparative genetic analysis of inflammatory bowel disease and type 1 diabetes implicates multiple loci with opposite effects. Human Molecular Genetics, 19:2059-2967, 2010
  115. Wang K*, Zhang H*, Ma D*, Bucan M, Glessner JT, Abrahams BS, Salyakina D, Imielinski M, Bradfield JP, Sleiman PM, Kim CE, Hou C, Frackelton E, Chiavacci R, Takahashi N, Sakurai T, Rappaport E, Lajonchere CM, Munson J, Estes A, Korvatska O, Piven J, Sonnenblick LI, Alvarez Retuerto AI, Herman EI, Dong H, Hutman T, Sigman M, Ozonoff S, Klin A, Owley T, Sweeney JA, Brune CW, Cantor RM, Bernier R, Gilbert JR, Cuccaro ML, McMahon WM, Miller J, State MW, Wassink TH, Coon H, Levy SE, Schultz RT, Nurnberger JI, Haines JL, Sutcliffe JS, Cook EH, Minshew NJ, Buxbaum JD, Dawson G, Grant SF, Geschwind DH, Pericak-Vance MA, Schellenberg GD, Hakonarson H. Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature, 459:528-533, 2009
  116. Wang K, Horst JA, Cheng G, Nickle DC, Samudrala R. Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Computational Biology, 4:e1000181, 2008
  117. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant S, Hakonarson H, Bucan M. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research, 17:1665-1674, 2007
  118. Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genome-wide association studies. American Journal of Human Genetics, 81:1278-1283, 2007
  119. Wang K, Mittler JE, Samudrala R. Comment on Evidence for positive epistasis in HIV-1. Science, 312:848, 2006
  120. Wang K. Gene-function wiki would let biologists pool worldwide resources. Nature, 439:534, 2006
  121. Wang K, Samudrala R. Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics, 17:385, 2006
  122. Wang K, Samudrala R. FSSA: a novel method for identifying functional signatures from structural alignments. Bioinformatics, 21:2969-2977, 2005
  1. Guan Y, Wang K. Whole-genome multi-SNP analysis. In: Statistical Bioinformatics. Edited by Do KA, Qin Z, Vannucci M. Cambridge University Press, 2013
  2. Wang K. Epistasis. In: Encyclopedia of Autism Spectrum Disorders. Edited by Volkmar FR. Springer, 2013
  3. Fang L, Wang K. Identification of Copy Number Variants from SNP Arrays Using PennCNV. In: Methods in Molecular Biology. Edited by Derek Bickhart. Springer, vol. 1833, 2018
  • PhenCards (https://phencards.org): a web server linking human phenotype information to biomedical knowledge. Users can query for relevant information with human phenotype terms, disease names, or clinical free text.
  • Phen2Gene (https://phen2gene.wglab.org): a web server to prioritize candidate genes for Mendelian diseases given a list of Human Phenotype Ontology terms, or a paragraph of clinical texts
  • COVID19 Knowledge Graph (http://covid19nlp.wglab.org): a knowledge-graph web server allows users dynamically query COVID-19 related biomedical knowledge through natural language questions from large-scale, free-text of scientific papers, including abstracts and full text
  • DeepMod (https://github.com/WGLab/DeepMod): a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
  • LinkedSV (https://github.com/WGLab/LinkedSV): a structural variant caller for 10X Genomics (linked-read) sequencing data. It detects deletions, duplications, inversions and translocations using evidence from the barcoded reads
  • NanoMod (https://github.com/WGLab/NanoMod): a computational tool for the detection of DNA modifications using Nanopore long-read sequencing data
  • NanoCaller (https://github.com/WGLab/NanoCaller): a deep-learning based tool for SNP and indel detection using long-read sequencing
  • DeepRepeat (https://github.com/WGLab/DeepRepeat): a deep neural network to identify simpel repeats directly from signal intensity patterns of long-read sequencing without base calling
  • EHR-Phenolyzer (https://github.com/WGLab/EHR-Phenolyzer): a python pipeline to automatically translate raw clinical notes into meaningfully ranked candidate causal genes. It might greatly shorten the time for disease causal genes identification and discovery
  • Phenolyzer (http://phenolyzer.wglab.org): a tool focusing on discovering genes based on user-specific disease/phenotype terms
  • wInterVar (http://wintervar.wglab.org): a bioinformatics software tool for clinical interpretation of genetic variants by the ACMG/AMP 2015 guideline
  • RepeatHMM (https://github.com/WGLab/RepeatHMM): a bioinformatics software tool for estimation of repeat counts on microsatellites from long-read sequencing data
  • wANNOVAR (http://wannovar.wglab.org): a rapid, efficient tool to annotate functional consequences of genetic variation from high-throughput sequencing data. wANNOVAR provides easy and intuitive web-based access to the most popular functionalities of the ANNOVAR software
  • PennCNV (http://penncnv.openbioinformatics.org): a rapid, free software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays