Identification of cancer driver genes/variants and optimal drug treatments from sequencing, expression and structural variants data.

 

Introduction

Cancer is caused by accumulation of somatic driver mutations harbored in driver genes. Despite the establishment of various individual computational methods for pinpointing these driver events, joining hands in knowledge accumulation of cancer biology networks and principles in increasing global endeavor, robust computational tools exploiting such prior knowledge to aid driver genes discovery are still underdeveloped. To address this issue, we devised integrated CAncer GEnome Score (iCAGES), a statistical framework to prioritizes potential cancer driver genes in virtue of their biological prior association and its ensemble cancer-driving potential.

Features

1. Personalized cancer driver gene profiling. Designed for future real-life application of cancer diagnostics, iCAGES can prioritize cancer driver genes for individual patient, without sacrificing its performance. 

2. Machine learning modeling. Using radial SVM as its first-layer, iCAGES can capture the non-linear relationships between different predictors in separating cancer driver mutations from neutral ones, resulting in a boosted performance.

3. Integration of prior knowledge. In the second layer of iCAGES, it employs a two-step ranking process and integrates related biological prior knowledge, downstream gene-gene interaction information and genomic complexity of the cancer driver event, which helps to more accurately and comprehensively evaluate cancer driver genes.

4. Precison medicine prediction. In the third layer of iCAGES, it searches for candidate drugs that are interacting with our predicted cancer drivers and weights them using its drug activity, targets' cancer driving potential and connnectivity between its direct target and patients' cancer driver genes.

 

 Workflow

General: input somatic mutations and output cancer driver genes

iCAGES takes two forms of input, ANNOVAR input format or VCF files. Using these data, iCAGES then runs its three-layer pipeline and after approximately 20-25 minutes, it can generate its output summarizing cancer driver genes that it nominated and can notify you through email.

First layer of iCAGES: from somatic mutation calls to prioritized list of candidate cancer driver mutations.

To comprehensively evaluate the effect of rare and common mutations on cancer pathogenesis, we used radial Support Vector Machine (SVM) trained on somatic non-synonymous Single Nucleotide Variations (nsSNVs) from COSMIC and Uniprot databases as the first layer for iCAGES. Using somatic mutations as input, iCAGES can calculate the correponding radial SVM predicted score for each mutation, evaluating the cancer diriver potential for each mutation.

 

first layer

 

Second layer of iCAGES: from candidate driver mutations to cancer driver genes.

To incorporate valuable knowledge generated from decades of research, we add one more layer on top of the last radial SVM layer. This layer weights each mutation using its correponding gene's Phenolyzer score, which evaluates the genetic-phenotypic association based on previous knowledge. Then it filters for genes that harbor rather deleterious mutations and ranks these genes according to their total weighted score, iCAGES score. Finally, a binary prediction of whether not or this gene is likely to be a driver is given, which classifies genes with top 20% iCAGES score as probable driver. Note that this cutoff is currently arbitrary and may change in the future.

 

Third layer of iCAGES: from candidate driver genes to candidate drugs

To better help researchers/clinicians research for potential personalized treatment, we added the third layer of iCAGES, which gives a prioritizes drug list that are associated with each cancer driver genes. This layer search for candiate drugs interacting with our predicted cancer driver genes and weights them using the correponding target gene's iCAGES score and activity score retrieved from PubChem database.

Availability

iCAGES is available at http://icages.usc.edu/

Reference

Dong C, Yang H, He Z, Liu X, Wang K. iCAGES: integrated CAncer GEnome Score for understanding personal cancer genomes. bioRxiv doi: http://dx.doi.org/10.1101/015008