Skip to content

DataINCITE/AlzheimersDS

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Alzheimer's Data Science 2022

Notebooks and other assets in support of the Alzheimer's Data Science Summer Bootcamp Summer 2022.

Overview

This repository houses R and Seurat-based data analysis workflows aiming to explore and visualize a large corpus of scRNAseq data from organoid models of frontotemporal dementia and isogenic controls. We include data processing workflows for extracting scRNAseq data directly from Seurat objects into workable R matrices.

The scRNAseq data set analyzed was originally generated in an experimental suite presented in Bowles et al. (2021), and is stored here in FinalMergedData.rds. This data represents gene expression levels and cell attributes for a large body of 370,000 individual cells, encompassing a set of mutant cells with a mutation in the MAPT gene that is implicated in frontotemporal dementia (FTD), as well as the equivalent CRISPR-corrected control cells.

We are interested in analyzing subsets of this data based on various combinations of cell attributes, in order to further characterize genetic perturbations associated with frontotemporal dementia in mutant cells versus controls, how these changes manifest in each neuronal cell type, and how these changes evolve over time.

Guide to relevant files in this directory:

Data:

  • Data dependencies for notebooks in this repo are now housed in a supporting repo at https://github.rpi.edu/DataINCITE/AlzheimersDSData. For RPI-internal users, data is accessible in a dedicated folder on the IDEA cluster.

  • Known_CellType_Marker_Genes.csv - top marker genes associated with each of the 16 primary celltypes represented in the scRNAseq single-cell dataset we analyze. Marker genes are drawn from the original resources used to annotate cell types in the Bowles paper.

  • top_DE_genes_all_categories.csv - genes identified by differential expression testing with Seurat as being significantly expressed in V337M (FTD-tau mutant) cells versus controls, as calculated individually by timepoint and for each celltype. For the workflow used to produce this dataset see dev-notebooks/overall-DE-analysis-eachtimepoint-allcelltypes.Rmd.

Notebooks:

  • Teaching Notebooks:

    • Notebook1-Intro-To-The-Data.Rmd
    • Notebook2-Intro-To-Seurat.Rmd
    • Notebook3-Data-Analysis.Rmd
    • Notebook4-Data-Access-and-Extraction.Rmd
    • Notebook5-Drawing-Biological-Connections.Rmd

    Descriptions in bootcamp syllabus here

  • Developer Notebooks:

    • data-preprocessing.Rmd- preliminary exercises in inspection of Seurat object contents, data access and extraction from Seurat objects, functionalized workflow for manual data subset generation
    • seurat-data-analysis.Rmd- Seurat-based data access, manipulation and visualization examples
    • data-analysis.Rmd- exercises in manual data manipulation/aggregation using created data subsets, manually constructed (non-Seurat-based) data visualizations
    • PCA_clustering_analysis_one_celltype.Rmd - An example of how you could do a PCA and clustering analysis at the CellType level (for one CellType); here data is organized SWOT-clock-style (timeseries) but could be easily modified to use different data structures as input to clustering. Contains code for running basic Gene Ontology enrichment analysis on a subset of genes of interest (for example, the set of genes enriched in one cluster of cells)
    • find-celltype-marker-genes.Rmd - Notebook to perform cluster biomarker identification for a given cell type, using the differential expression testing model and parameters used in Bowles et al. 2021
    • profile-genes-for-biological-functions.Rmd - Notebook to perform Gene Ontology Enrichment and Revigo GO summarization on an input gene list
    • run-differential-expression-testing.Rmd - Notebook implementing DE testing to identify genes differentially expressed between diseased and control groups of cells within one celltype. Analysis is replicated across each of our 3 experimental timepoints.
    • ReactomeGSA.Rmd - guide to using the ReactomeGSA package to perform a pathway analysis of cell clusters in single-cell RNA-sequencing data.
    • STRINGdb-Analysis.Rmd - guide to using the STRING database to identify protein-protein interaction networks within an input list of genes
    • overall-DE-analysis-eachtimepoint-allcelltypes.Rmd - workflow and statistical methods used to create the Bowles dataset differential gene expression analyzer tool available at https://whiter9.shinyapps.io/DEtableDemo/.

Important metadata definitions associated with this dataset:

Cell lines represented

Single-cell data was originally collected from organoids derived from 3 main cell lines (ND, GIH6, GIH7) from the Tau Consortium in conjunction with the Neural Stem Cell Institute. Each line encompasses an FTD mutant version and the corresponding CRISPR-corrected isogenic control. These iPSC lines were differentiated into cerebral organoids and subjected to single-cell RNA sequencing at 2, 4, and 6 months.

We primarily analyze cell lines ND and GIH6, as experimental groups within line GIH7 are less well-defined.

Cell Line Key

Line Wildtype/mutant
ND-B06 Wildtype
ND-B09 Mutant
GIH6-E11 Wildtype
GIH6-A02 Mutant
GIH7-F02 Wildtype
GIH7-B12 Wildtype
GIH7-A01 Mutant

In the dataset, "wildtype" and "mutant" are denoted by "V337V" and "V337M", respectively.

Cell types represented

Bowles et al. clustered the single cells of this data set into 16 primary neuronal cell type clusters. Cell types were identified by both automated and expert cell type annotation. The identified categories are represented as follows:

Cell Type Key

  • Ast - astrocytes
  • ExDp1- excitatory deep layer 1
  • ExDp2- excitatory deep layer 2
  • ExM- maturing excitatory
  • ExM-U- maturing excitatory upper enriched
  • ExN- newborn excitatory
  • Glia- unspecified glia/non-neuronal cells
  • InCGE- interneurons caudal ganglionic eminence
  • InMGE- interneurons medial ganglionic eminence
  • IP- intermediate progenitors
  • OPC- oligodendrocyte precursor cells
  • oRG- outer radial glia
  • PgG2M- cycling progenitors (G2/M phase)
  • PgS- cycling progenitors (S phase)
  • UN- unspecified neurons
  • vRG- ventricular radial glia.

References

Major software and analysis tools featured

* `Seurat`, `R` Toolkit for Single-Cell Genomics: https://satijalab.org/seurat/
* `g:profiler` for Gene Ontology enrichment analysis
* `rrvgo` for Revigo Gene Ontology semantic summarization and visualization
* `ReactomeGSA` for Performing a gene set analysis using the Reactome Analysis System
* `STRINGDB` for mapping Protein-protein Interactions within gene sets

About

R notebooks and other materials for the Alzheimer's Data Science Summer Bootcamp (2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages