Publications

  • Amlie-Wolf et al., Inferno - INFERing the molecular mechanisms of NOncoding genetic variants. bioRXiv. 2017 Oct. doi:10.1101/211599. Epub 2017 Oct 30. 

Abstract: The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, where they affect regulatory elements including transcriptional enhancers. We propose INFERNO (INFERring the molecular mechanisms of NOncoding genetic variants), a novel method which integrates hundreds of diverse functional genomics data sources with GWAS summary statistics to identify putatively causal noncoding variants underlying association signals. INFERNO comprehensively infers the relevant tissue contexts, target genes, and downstream biological processes affected by causal variants. We apply INFERNO to schizophrenia GWAS data, recapitulating known schizophrenia-associated genes including CACNA1C and discovering novel signals related to transmembrane cellular processes.

  • Schellenberg et al., Large-Scale DNA sequence analysis and Alzheimer's disease genetics. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar. 

Abstract: A substantial amount of the heritability of Alzheimer’s disease (AD) remains to be explained. Early family studies lead to identification of rare mutations in 3 genes (APP, PSEN1 and PSEN2), and common variants in APOE. Subsequent work using high-density genotyping arrays identified over 30 common variants loci. The advent of low-cost high-throughput DNA sequencing makes it possible to identify additional rare single nucleotide variants (SNVs) in risk and protective genes. Study designs include studies of multiplex late-onset AD kindreds and larger case-control samples. The Alzheimer’s Disease Sequence Project (ADSP) generated whole genome sequence (WGS) from 1,000 family members and exome sequence data from 6,000 AD cases and 5,000 elderly normal controls. Other efforts are generating additional WGS and WES data relevant to AD. While analyses of these large data sets may yield AD risk genes, to identify rare variants of modest effect size will require much larger data sets. While WGS/WES costs are declining, obtaining genetic data from very large samples will require use of high-density imputation panels to follow up candidates from sequencing experiments. In addition to SNVs, it is now possible to derive high-quality structural variants (SVs; indels. Insertions, deletions, copy number variation, and chromosome-level alterations) from sequence data.  SVs, particularly short indels and variants <5000bp have not been tested in most disease studies including AD. SVs account for a large part of genetic variation in human.  By using multiple analytic approaches and technologies for detecting genetic variation, we hope to resolve the genetics of AD more-completely.

  • Wang, Li-San. Role and resources of National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site and Genome Center for Alzheimer's Disease. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar. 

Abstract: National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) is a national genetics repository created by NIA to facilitate access by qualified investigators to genotypic data for the study of genetics of late-onset Alzheimer's disease.  The Genome Center for Alzheimer's Disease (GCAD) coordinates the integration and meta-analysis of all available Alzheimer’s disease (AD) relevant genetic data with the goal of identifying AD risk/causative/protective genetic variants and eventual therapeutic targets.  NIAGADS and GCAD support the Alzheimer's Disease Sequencing Project by collecting and harmonizing genomics data, and developing databases and portals for genomic information retrieval.  In this talk I will introduce both initiatives, their roles in ADSP, and how we can help investigators access ADSP data and resources.

  • Haines J et al., In Silico functional annotation of genomic variants and multi-gene analyses in late onset Alzheimer's disease. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar. 

Abstract: Objectives: A major difficulty in working with any sequencing data is providing the potential functional consequences of the identified variants.  Our goal is to provide consistent genomic annotations for Alzheimer disease (AD) sequencing data integrating data from a large set of diverse databases of functional impacts.  In addition, these data will be used to focus multi-gene analyses of the detected variants.

Methods:  We employed a strategy of integrating in silico functional information and applying it to large scale whole-exome and whole-genome sequencing efforts, We developed a workflow to provide investigators with predicted functional impact (from the Ensembl Variant Effect Predictor), variant allele frequencies observed in other studies (from the Kaviar database and the Wellderly Cohort), predicted loss-of-function status (from SNPEff), and multiple scoring metrics for assessment deleteriousness (including CADD, CATO, and SPIDEX scores).

Results:  We annotated over 28 million variants identified from >12,000 AD cases and controls. Of these, approximately 5 million are novel events not reported in multiple reference databases.  Because the incredible depth of available data makes annotation of non-coding regions especially challenging, we developed approaches to collapse and combine gene regulatory annotations and (when possible) to assign them to downstream genes.  Using these annotations for multi-gene analyses is currently underway.

Conclusions:  We constructed computational pipelines to generate detailed functional annotations for both coding and non-coding variants that enable hypothesis-driven analyses, and ultimately provide new insights into the pathogenesis in AD. 

  • Pericak-Vance M et al., Family-based analyses of whole genome sequencing in white, non-Hispanic populations. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar. 

Abstract: Objectives:  The Alzheimer’s Disease Sequencing Project (ADSP) is an initiative to identify genetic variation influencing risk in late-onset AD (LOAD) with whole-genome sequencing (WGS) on 229 subjects from 42 non-Hispanic White (NHW) extended AD families. We analyzed these data to identify putative risk variants co-segregating with disease.

Methods:  Standard bioinformatics protocols were applied, with multiple genotype callers used to develop consensus. Variants were annotated for function, frequency, segregation with disease, and with enhancer and expression QTL data. We examined segregation under consensus and family-specific linkage peaks, as well as within known AD candidate genes.

Results:  Within the two consensus linkage regions we identified 32 rare (MAF<0.01) SNVs segregating with disease, were absent from all cognitively normal individuals, and were putatively functional (CADD score > 10). Within the family-specific linkage regions we identified 12 SNVs segregating , putatively functional variants, including missense SNVs in TTC3 (CADD=32) and FSIP2 (CADD=25). We also identified 26 variants that were within known candidate genes and co-segregated in >75% of AD patients, were rare (MAF<5%), and putatively functional (CADD score > 15).  The candidate loci include APPPICALM, PSEN1GRNMS4A6A and MEF2C. Analysis of enhancer data identified multiple enhancer SNVs that segregate with disease and may influence gene expression.

Conclusions:  This study shows the power of segregation-based family designs in WGS studies of complex diseases like AD, and suggests TTC3 and FSIP2 as AD risk genes. In addition, rare variation in previously identified candidate genes may play a role in familial LOAD risk. 

  • Farrer L et al., Novel genetic variants and loci influencing risk of Alzheimer's disease identified by whole exome sequencing using an enriched case-control design. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar. 

Abstract: Objectives: To test the hypothesis that AD cases who have close relatives also affected by the disease (“enriched cases”) are more likely than other AD cases to have AD risk variants, we evaluated the association of AD with variants identified by whole exome sequencing (WES) in samples of unrelated non-Hispanic whites (507 enriched cases, 4,917 controls) and Carribean Hispanics (172 enriched cases, 177 controls) included in the Alzheimer’s Disease Sequencing Project.

Methods: WES data were submitted to a bioinformatics pipeline that included consensus genotype calling of single nucleotide variants (SNVs) and small indels using GATK and ATLAS protocols, and evaluation of cryptic relatedness and differential missingness. Associations were tested using the score test for individual variants and SKAT-O for gene-based tests, adjusting for age, sex, and principal components of ancesty.

Results:  We identified significant association with three SNVs near APOE, the previously established AD TREM2-R47H variant, and SNVs in PRSS1, SORBS1, NUFIP1, WDR59 and PKD1L2. Significant associations were also observed with small indels in SHKBP1, ZNF718, ZNF595, and TUBB4Q. Gene-based tests considering only highly deleterious variants revealed significant associations with CD22, PHTF1, PRSS1, SLC38A10, and TMEM82. Novel gene-based associations with DNPH1, FOXD4L1, IGHV3-64, PLCL2, and RPL19 were observed in tests considering high or moderate-effect variants.

Conclusions: This study identified significant AD associations with several novel genes. These findings suggest that persons in families with multiple cases are likely to harbor rare highly penetrant AD risk variants and that studies of enriched cases will help delineate mechanisms leading to AD. 

  • Mayeux et al., Whole exome and whole genome sequencing in Caribbean Hispanics families. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar. 

Abstract: While common variants at the APOE locus can influence the risk of late onset Alzheimer’s disease (LOAD), rare coding variants may also alter risk.  Families multiply affected by LOAD from inbred, island populations can be enriched for such variants.  We have investigated Caribbean Hispanic families from the Dominican Republic multiply affected by LOAD to identify novel coding variants. We used two experimental approaches: 1) to detect rare coding variants underlying loci detected by genome-wide association studies (GWAS) we conducted targeted sequencing of ABCA7, BIN1, CD2AP, CLU, CR1, EPHA1, MS4A4A/MS4A6ASORL1 and PICALM in three independent LOAD cohorts; 2) Whole exome sequencing was also completed in 31 Caribbean Hispanic families without known mutations (e.g. APPPSEN1 or 2) or APOEε4 homozygous carriers. In the first experiment, a statistically significant 3.1-fold enrichment of the nonsynonymous mutations was found among LOAD cases compared to controls with no difference in synonymous variants. Mutations were identified in ABCA7CD2APEPHA1SORL1 and BIN1. The EPHA1 variant segregated completely in an extended Caribbean Hispanic family. In the second experiment, rare missense mutations in the Snf2-related CREB binding protein activator, SRCAP, were found in eight unrelated families. In both experiments the frequency of these variants were significantly greater in the affected than in the unaffected family members and significantly different from the frequency found in the Exome Aggregation Exchange for the Latino population.  High throughput sequencing of an inbred, island population can reveal an excess burden of deleterious coding mutations in LOAD. Identifying coding variants in LOAD will facilitate the creation of tractable models for investigation of disease-related mechanisms and potential therapies. 

  • Kunkle BW et al., Genome-wide linkage analyses of non-Hispanic white families identify novel loci for familial late-onset Alzheimer's disease. Alzheimers Dement. 2016 Jan;12(1):2-10. doi: 10.1016/j.jalz.2015.05.020. Epub 2015 Sep 11. PMID: 26365416.

Abstract: INTRODUCTION: Few high penetrance variants that explain risk in late-onset Alzheimer's disease (LOAD) families have been found. METHODS: We performed genome-wide linkage and identity-by-descent (IBD) analyses on 41 non-Hispanic white families exhibiting likely dominant inheritance of LOAD, and having no mutations at known familial Alzheimer's disease (AD) loci, and a low burden of APOE ε4 alleles. RESULTS: Two-point parametric linkage analysis identified 14 significantly linked regions, including three novel linkage regions for LOAD (5q32, 11q12.2-11q14.1, and 14q13.3), one of which replicates a genome-wide association LOAD locus, the MS4A6A-MS4A4E gene cluster at 11q12.2. Five of the 14 regions (3q25.31, 4q34.1, 8q22.3, 11q12.2-14.1, and 19q13.41) are supported by strong multipoint results (logarithm of odds [LOD*] ≥1.5). Nonparametric multipoint analyses produced an additional significant locus at 14q32.2 (LOD* = 4.18). The 1-LOD confidence interval for this region contains one gene, C14orf177, and the microRNA Mir_320, whereas IBD analyses implicates an additional gene BCL11B, a regulator of brain-derived neurotrophic signaling, a pathway associated with pathogenesis of several neurodegenerative diseases. DISCUSSION: Examination of these regions after whole-genome sequencing may identify highly penetrant variants for familial LOAD.

  • Barral S et al., Linkage analyses in Caribbean Hispanic families identify novel loci associated with familial late-onset Alzheimer's disease. Alzheimers Dement. 2015 Dec;11(12):1397-406. doi: 10.1016/j.jalz.2015.07.487. Epub 2015 Oct 1. PMID: 26433351.

Abstract: INTRODUCTION: We performed linkage analyses in Caribbean Hispanic families with multiple late-onset Alzheimer's disease (LOAD) cases to identify regions that may contain disease causative variants. METHODS: We selected 67 LOAD families to perform genome-wide linkage scan. Analysis of the linked regions was repeated using the entire sample of 282 families. Validated chromosomal regions were analyzed using joint linkage and association. RESULTS: We identified 26 regions linked to LOAD (HLOD ≥3.6). We validated 13 of the regions (HLOD ≥2.5) using the entire family sample. The strongest signal was at 11q12.3 (rs2232932: HLODmax = 4.7, Pjoint = 6.6 × 10(-6)), a locus located ∼2 Mb upstream of the membrane-spanning 4A gene cluster. We additionally identified a locus at 7p14.3 (rs10255835: HLODmax = 4.9, Pjoint = 1.2 × 10(-5)), a region harboring genes associated with the nervous system (GARS, GHRHR, and NEUROD6). DISCUSSION: Future sequencing efforts should focus on these regions because they may harbor familial LOAD causative mutations.