We perform exome/genome sequencing and data analysis to identify disease genes on rare undiagnosed diseases.


A significant number of rare Mendelian diseases, especially those involving neuronal phenotypes, remain undiagnosed or misdiagnosed. Accurate genetic diagnosis may potentially benefit disease management or treatment. We have developed multiple collaborations with researchers on rare undiagnosed diseases to apply exome/genome sequencing for disease gene finding.

Genome/exome sequencing can quickly identify disease genes for known diseases, or identify novel candidate genes for novel syndromes. Several examples are given below:

1. Idiopathic hemolytic anemia (known disease)

In collaboration with Dr. Gholson Lyon, we sequenced a pedigree segregating both a complex disease (ADHD) and a Mendelian disease (hemolytic anemia). While we did identify some rare variants that might predispose to ADHD, we have not yet proven the causality for any of them. However, over the course of the study, one subject was discovered to have idiopathic hemolytic anemia (IHA), which was suspected to be genetic in origin. Analysis of this subject’s exome readily identified two rare non-synonymous mutations in PKLR gene as the most likely cause of the IHA. We further confirmed the deficiency by functional biochemical testing, consistent with a diagnosis of red blood cell pyruvate kinase deficiency. Details of the study was published in Discovery Medicine.

2. MPS3B (known disease)

We encountered a case in China where two siblings both began to develop idiopathic progressive cognitive decline starting from age six, and were suspected to have an undiagnosed neurological disease. In collaboration with Dr. Shi, exome sequencing identified NAGLU as the most likely candidate gene with compound heterozygous mutations. Sanger sequencing confirmed the recessive patterns of inheritance, leading to a genetic diagnosis of Sanfilippo syndrome (mucopolysaccharidosis IIIB). Biochemical tests confirmed the complete loss of activity of alpha-N-acetylglucosaminidase (encoded by NAGLU) in blood, as well as significantly elevated dermatan sulfate and heparan sulfate in urine. Structure modeling revealed the mechanism on how the two variants affect protein structural stability. Details of the study can be found here.

3. Ogden syndrome (novel syndrome)

One of the early examples to identify genes for "novel syndrome" was Ogden syndrome, a previously unreported infantile lethal disorder, involving a mutation in NAA10. Together with Dr. Gholson, we identified the disease gene by chromosome exon X capture and next-generation sequencing. Details can be found here.

4. RBCK1 deficiency syndrome (novel syndrome)

A more recent example is a novel genetic disease which we refer to as "Bookman syndrome", a pediatric onset disease with neuromuscular and cardiac involvement and with clinical features similar to Glycogen Storage Disease Type IV. Although exome sequencing failed to identify the causes for the disease due to technical reasons, we applied whole-genome sequencing and transcriptome sequencing and identified disease-contributory mutations in RBCK1. This is an early example illustrating the limitations of exome sequencing in finding disease causal mutations.

5. TAF1 deficiency syndrome (novel syndrome)

In collaboration with Dr. Lyon, we analyzed an extended family with three generations, and sequenced them by Illumina WGS and Complete Genomics WGS. Two affecteds in the third generation are both affected with severe intellectual disability, autistic behaviors, ADHD, and very distinctive facial features. Family-based analysis pinpointed TAF1 as the most likely candidate gene. Interestingly, the X-linked non-synonymous mutation in TAF1 was detected as a de novo mutation arising in the mother of the two affecteds. Details can be found here.

6. Age-related macular degeneration (known disease)

In collaboration with Dr. Huang, we are analyzing over 100 families with various types of eye diseases, especially those with dry/wet type of age-related macular degeneration, to identify new genes responsible for these rare genetic syndromes. The wet/neovascular type affects approximately 10-15% of individuals with age-related macular degeneration, but accounts for approximately 90% of all cases of severe vision loss from the disease.

7. facioscapulohumeral muscular dystrophy(known disease)

Facioscapulohumeral Muscular Dystrophy (FSHD) is a common adult muscular dystrophy in which the muscles of the face, shoulder blades and upper arms are among the most affected. FSHD is the only disease in which "junk" DNA is reactivated to cause disease, and the only known repeat array-related disease where fewer repeats cause disease. More than 95% of FSHD cases are associated with copy number loss of a 3.3kb tandem repeat (D4Z4 repeat) at the subtelomeric chromosomal region 4q35, of which the pathogenic allele contains less than 10 repeats and has a specific genomic configuration called 4qA. Currently, genetic diagnosis of FSHD requires pulsed-field gel electrophoresis followed by Southern blot, which is labor-intensive, semi-quantitative and requires long turnaround time. We developed a novel approach for genetic diagnosis of FSHD, by leveraging Bionano Saphyr single-molecule optical mapping platform. Using a bioinformatics pipeline developed for this assay, we found that the method gives direct quantitative measurement of repeat numbers, can differentiate 4q35 and the highly paralogous 10q26 regions, can determine the 4qA/4qB allelic configuration, and can quantitate levels of post-zygotic mosaicism. Details can be found here.

8. Nanopore sequencing in Preimplantation genetic diagnosis (diagnostic methods for known diseases)

For a proportion of individuals judged clinically to have a recessive Mendelian disease, only one pathogenic variant can be found from clinical whole exome sequencing (WES), posing a challenge to genetic diagnosis and genetic counseling. Here we describe a case study, where WES identified only one pathogenic variant for an individual suspected to have glycogen storage disease type Ia (GSD-Ia), which is an autosomal recessive disease caused by bi-allelic mutations in the G6PC gene. Through Nanopore long-read whole-genome sequencing, we identified a 7kb deletion covering two exons on the other allele, suggesting that complex structural variants (SVs) may explain a fraction of cases when the second pathogenic allele is missing from WES on recessive diseases. Both breakpoints of the deletion are within Alu elements, and we designed Sanger sequencing and quantitative PCR assays based on the breakpoints for preimplantation genetic diagnosis (PGD) for the family planning on another child. Four embryos were obtained after in vitro fertilization (IVF), and an embryo without deletion in G6PC was transplanted after PGD and was confirmed by prenatal diagnosis, postnatal diagnosis, and subsequent lack of disease symptoms after birth. In summary, we present one of the first examples of using long-read sequencing to identify causal yet complex SVs in exome-negative patients, which subsequently enabled successful personalized PGD. Details can be found here.

9. Nanopore sequencing in Familial Cortical Myoclonic tremor Tremor with Epilepsy (known disease)

The locus for familial cortical myoclonic tremor with epilepsy (FCMTE) has long been mapped to 8q24 in linkage studies, but the causative mutations remain unclear. Recently, expansions of intronic TTTCA and TTTTA repeat motifs within SAMD12 were found to be involved in the pathogenesis of FCMTE1 in Japanese pedigrees. In collaboration with Xiangya Hospital, we performed genetic linkage analysis by microsatellite markers in a five-generation Chinese pedigree with 55 members, and used low-coverage (~10X) long-read genome sequencing (LRS) on the PacBio Sequel and Oxford Nanopore platform to identify the causative mutations as intronic TTTCA and TTTTA repeat expansion in SAMD12, thus corroborating the recently published results in Japanese pedigrees.This was the first report of Chinese FCMTE pedigrees with SAMD12 intronic repeat expansions. Our study also suggested that Nanopore sequencing is an effective tool for molecular diagnosis of genetic disorders, especially for neurological diseases that cannot be positively diagnosed by conventional clinical microarray and NGS technologies. Details can be found here.

10. Nanopore sequencing in balanced translocations in Preimplantation genetic diagnosis (diagnostic methods for known diseases)

The precise detection of balanced translocation breakpoints at the nucleotide level together with phasing information are important in assisted reproduction technology (ART) and preimplantation genetic diagnosis (PGD). It has been demonstrated that the third generation sequencing by Oxford Nanopore technology (ONT) detect structural abnormalities more directly and efficiently than traditional methods. In collaboration with CITIC Xiangya Hospital, we applied this technology to detect the breakpoints from 7 translocation carriers where potentially pathogenic SVs had initially been detected by karyotyping at the chromosome level. The results showed that all the balanced translocations sequenced by ONT at a 10X coverage were completely detected and consistent with the previous karyotyping results. Details can be found here.

11. Structural variants in Duchenne and Becker muscular dystrophies (DMD/BMD) (known disease)

Structural variants within DMD are known to account for over 80 percent of pathogenic mutations. However, few studies reported complete profiling of structural variations in DMD and the potential mechanisms responsible for inducing breaks in different fragile regions. In collaboration with a former student at Peking Union Hospital, we analyzed 896 male probands with diagnosis of DMD/BMD. We observed that the exons 48-50 deletion was the most frequently deletion in the DMD/BMD patients, while exon 2 duplication was the most frequently occurring duplication pattern. Surrounding the breakpoints, we discovered two long motifs, non-consensus microhomologies, low copy repeats, palindromic sequences, and micro-indel, which could predispose DMD to microhomology-mediated replication-dependent recombination and non-homologous end joining.

These examples clearly illustrated the power of genome seuqencing in uncovering genetic basis or facilitating genetic diagnosis for rare undiagnosed diseases.