We are developing integrated approaches for whole-genome characterization of cancer patients to improve personalized therapy.

Cancer is caused by accumulation of somatic driver mutations harbored in driver genes. When it comes to a patient, the challenge now for his/her molecular diagnostics and treatment lies in rapid and accurate identification of these driver mutations/genes harbored in his/her tumor cells from a large number of background noises of passengers, given his/her genomic information.

Second-generation sequencing technology enabled researchers to rapidly identify somatic mutations from a patient by comparing the sequence from his/her tumor with healthy tissues. Accordingly, tools were developed to help identifying these cancer culprits, using readily available personal cancer genomic information generated from second-generation sequencing. Despite the establishment of various individual computational methods, joining hands in knowledge accumulation of cancer biology networks and principles in increasing global endeavor, robust computational tools exploiting such prior knowledge to aid driver genes discovery are still underdeveloped.

To address this issue, we devised integrated CAncer GEnome Score (iCAGES), a novel statistical framework that infers driver variants by integrating contributions from coding, non-coding, and structural variants, identifies driver genes by combining genomic information and prior biological knowledge, then generates prioritized drug treatment. Analysis on The Cancer Genome Atlas (TCGA) data showed that iCAGES predicts whether patients respond to drug treatment (P=0.006 by Fisher’s exact test) and long-term survival (P=0.003 from Cox regression).

More recently, we developed GDNNP, a Group lasso based Deep Neural Network for integrating multiple sources of molecular, demographic and clinical information to predict survival for personal cancer genomes. Integrating patients’ high-dimensional molecular profiles, demographic and clinical information to better understand the molecular mechanisms of cancer remains a challenge, due to the complexity and the size of the data. Multiple approaches have tried to tackle this challenge, but they are limited by the number of participating patients and the dimension of their data.

GDNNP demonstrated higher prediction accuracy than models that only used genomic variants to predict patients’ prognosis, in both simulation studies and in real patients’ data. It also indicated advantage of using deep learning regularization approach to boost prediction accuracy than traditional regularized models.Our study provided a possibility of using deep learning regularization approaches to integrate patients’ molecular, demographic and clinical profiles to accurately infer cancer patients’ prognosis.

Finally, we are developing a variant clinical interpretation system called CancerVar specifically for cancer, based on the ACMG/ASCO/CAP guideline published in 2017. On the basis of the results of professional surveys, literature review, and the Working Group's subject matter expert consensus, a four-tiered system to categorize somatic sequence variations based on their clinical significances is proposed: tier I, variants with strong clinical significance; tier II, variants with potential clinical significance; tier III, variants of unknown clinical significance; and tier IV, variants deemed benign or likely benign.

The combination of tools such as iCAGES, GNDDP and CancerVar will facilitate the implementation of precision oncology and improve outcome for patients with cancer.