ESR01 – Aravind Sankar – based in Cambridge

Project Title: The prioritisation and functional validation of genetic variants that contribute to melanomagenesis.
PhD Award Expected: Spring 2020

As a computer science student in school, I was interested in studying the applications of computers in addressing real-world problems, especially biological ones. This led me to pursue my Bachelor’s degree in Bioinformatics at SASTRA University, India. During this period, I got the opportunity to complete my final year project at the Karlsruhe Institute of Technology, Germany where I worked on the development of web-based platforms used for storing and visualizing experimental microscopy assays. To further refine my understanding of the field , I did my Master’s degree in Bioinformatics at the University of Helsinki, Finland. Concurrently, I worked for a year and a half at the Helsinki Institute for Information Technology as a research assistant. My work involved the design, development and evaluation of a bioinformatics method used to identify bacterial strains in a mixed sample and estimate their individual abundances. The experience I got during this project, which eventually became my thesis, reaffirmed my interest in pursuing a career in bioinformatics research, with a graduate degree being the natural next step in the process.

When I am not working, I like to relax by reading books, playing football and quizzing. I also like to travel extensively whenever possible.


Research Summary

Since the discovery of CDKN2A as the primary driver gene in familial melanoma, several other driver genes have been identified including BAP1, TERT and POT1. However, the germline mutations responsible for more than half of the individuals affected by familial melanoma globally are still unknown. This project aimed at identifying novel germline variants that predispose the individuals and the families carrying these variants to develop familial melanoma. A total of 308 individuals belonging to 133 families of European descent were identified from 9 different institutions across the world. These individuals were sequenced through a combination of exome and whole genome sequencing. Several different procedures were implemented for the analysis of these families, with a brief summary of each approach given below.

The samples that were selected to be studied as part of this project were split into 4 different datasets. The criteria for the sample selection were different for each of these datasets, with different sequencing methodologies applied for each set as well. Once the samples had been sequenced, sequence alignment and joint variant calling were performed uniformly across all 4 datasets.

Once the samples and been sequenced and variant calling had been performed, the next step was to identify rare variants within the dataset. Several quality control filters were applied to eliminate low quality variants. In order to effectively define candidate driver genes, a strategy was developed to determine genes with an increased burden of mutations. Individuals from gnomAD were chosen as a control dataset. Variants in the cases and the controls were filtered using the same workflow. An association analysis was performed on the filtered variants to identify genes with an increased burden of mutations in the cases. In addition to BAP1, candidate genes including MUC4, UBR5, ITGAV and EPHA7 were discovered. To account for the family structure of the samples in the dataset, a joint association-linkage analysis using pVAAST was also implemented. LOD scores were combined with the association scores to generate CLRT scores for every gene which was used to rank the genes, resulting in more novel candidate genes.

Variants in previously identified melanoma driver genes were checked to ensure that the families in the dataset did not carry a nonsense variant in such genes. A mixture of known and novel variants was identified in 8 families in genes including BAP1, BRCA2, CDKN2A, POT1 and MITF. In a second method, the proportion of samples in a pedigree carrying a specific variant were estimated for all variants. Loss-of-function mutations in ATR, TP53AIP1 and EXO5 were found in 10 pedigrees. All of these genes have previously been associated with cancer development and in the case of TP53AIP1 and EXO5, specifically to melanoma development. The third and final secondary analysis of the exonic region variants focussed on determining the presence of known pathogenic variants within the cases. ClinVar, a curated database of variants, their estimated pathogenicity and the associated disorders, was utilised for this purpose. Pathogenic variants in genes associated with oculocutaneous albinism and hereditary cancer syndrome were identified.

A subset of the individuals selected for the project were whole genome sequenced. The availability of variant information across the entire genome allowed for the analysis of both small and large non-coding changes and their potential impact on melanoma oncogenesis; this is an aspect of familial melanoma research that has been relatively unexplored. Two complementary analyses were identified for this purpose. The first approach focussed on variation in the regions of the genome that contained transcription factor binding motifs. Known transcription factor binding motif sites were identified from Ensembl. An association analysis was performed to identify genes with an increased burden of transcription factor binding motif variants. VAV1, SKI and SRC were identified as potential candidates. The second approach centred on the impact of large-scale structural variation on melanoma onset. Insertions, deletions, translocations and duplications were identified on the 123 whole-genome individuals belonging to the pilot whole-genome dataset. Novel structural variants were determined by estimating large overlapping variations that were present in all sequenced members of pedigree. This led to the discovery of a 233,780 base pair deletion upstream of the transcription start site of CDKN2A. This deletion was observed in 10/11 members of a pedigree from Sydney.

In summary, a multipronged approach was utilized to determine novel germline variants in familial melanoma patients affecting both the coding and the non-coding regions of the genome to identify candidate melanoma driver genes. Several candidate genes affecting smaller number of families were discovered across all applied methods.

Publications

Artomov M, Stratigos AJ, Kim I, Kumar R, Lauss M, Reddy BY, Miao B, Robles-Espinoza CD, Sankar A, Njauw C-N, Shannon K, Gragoudas ES, Lane AM, Iyer V, Newton-Bishop JA, Bishop DT, Holland EA, Mann GJ, Singh T, Daly MJ, Tsao H. Rare Variant, Gene-Based Association Study of Hereditary Melanoma Using Whole-Exome Sequencing. J Natl Cancer Inst. 2017 Dec 1;109(12). Available at: https://academic.oup.com/jnci/article/109/12/djx083/3861235