Alzheimer's Data Science 2022
Notebooks and other assets in support of the Alzheimer's Data Science Summer Bootcamp Summer 2022.
Overview
This repository houses R
and Seurat
-based data analysis workflows aiming to explore and visualize a large corpus of scRNAseq data from organoid models of frontotemporal dementia and isogenic controls. We include data processing workflows for extracting scRNAseq data directly from Seurat
objects into workable R matrices.
The scRNAseq data set analyzed was originally generated in an experimental suite presented in Bowles et al. (2021), and is stored here in FinalMergedData.rds
. This data represents gene expression levels and cell attributes for a large body of 370,000 individual cells, encompassing a set of mutant cells with a mutation in the MAPT gene that is implicated in frontotemporal dementia (FTD), as well as the equivalent CRISPR-corrected control cells.
We are interested in analyzing subsets of this data based on various combinations of cell attributes, in order to further characterize genetic perturbations associated with frontotemporal dementia in mutant cells versus controls, how these changes manifest in each neuronal cell type, and how these changes evolve over time.
Guide to relevant files in this directory:
Data:
-
Data dependencies for notebooks in this repo are now housed in a supporting repo at https://github.rpi.edu/DataINCITE/AlzheimersDSData. For RPI-internal users, data is accessible in a dedicated folder on the IDEA cluster.
-
Known_CellType_Marker_Genes.csv
- top marker genes associated with each of the 16 primary celltypes represented in the scRNAseq single-cell dataset we analyze. Marker genes are drawn from the original resources used to annotate cell types in the Bowles paper.- Resource for astrocytes: https://www.sciencedirect.com/science/article/pii/S0896627315010193
- Resource for all other cell types: https://www.sciencedirect.com/science/article/pii/S0896627319305616#fig1
-
top_DE_genes_all_categories.csv
- genes identified by differential expression testing withSeurat
as being significantly expressed in V337M (FTD-tau mutant) cells versus controls, as calculated individually by timepoint and for each celltype. For the workflow used to produce this dataset seedev-notebooks/overall-DE-analysis-eachtimepoint-allcelltypes.Rmd
.
Notebooks:
-
Teaching Notebooks:
Notebook1-Intro-To-The-Data.Rmd
Notebook2-Intro-To-Seurat.Rmd
Notebook3-Data-Analysis.Rmd
Notebook4-Data-Access-and-Extraction.Rmd
Notebook5-Drawing-Biological-Connections.Rmd
Descriptions in bootcamp syllabus here
-
Developer Notebooks:
data-preprocessing.Rmd
- preliminary exercises in inspection ofSeurat
object contents, data access and extraction fromSeurat
objects, functionalized workflow for manual data subset generationseurat-data-analysis.Rmd
-Seurat
-based data access, manipulation and visualization examplesdata-analysis.Rmd
- exercises in manual data manipulation/aggregation using created data subsets, manually constructed (non-Seurat-based) data visualizationsPCA_clustering_analysis_one_celltype.Rmd
- An example of how you could do a PCA and clustering analysis at theCellType
level (for one CellType); here data is organized SWOT-clock-style (timeseries) but could be easily modified to use different data structures as input to clustering. Contains code for running basic Gene Ontology enrichment analysis on a subset of genes of interest (for example, the set of genes enriched in one cluster of cells)find-celltype-marker-genes.Rmd
- Notebook to perform cluster biomarker identification for a given cell type, using the differential expression testing model and parameters used in Bowles et al. 2021profile-genes-for-biological-functions.Rmd
- Notebook to perform Gene Ontology Enrichment and Revigo GO summarization on an input gene listrun-differential-expression-testing.Rmd
- Notebook implementing DE testing to identify genes differentially expressed between diseased and control groups of cells within one celltype. Analysis is replicated across each of our 3 experimental timepoints.ReactomeGSA.Rmd
- guide to using the ReactomeGSA package to perform a pathway analysis of cell clusters in single-cell RNA-sequencing data.STRINGdb-Analysis.Rmd
- guide to using theSTRING
database to identify protein-protein interaction networks within an input list of genesoverall-DE-analysis-eachtimepoint-allcelltypes.Rmd
- workflow and statistical methods used to create the Bowles dataset differential gene expression analyzer tool available at https://whiter9.shinyapps.io/DEtableDemo/.
Important metadata definitions associated with this dataset:
Cell lines represented
Single-cell data was originally collected from organoids derived from 3 main cell lines (ND
, GIH6
, GIH7
) from the Tau Consortium in conjunction with the Neural Stem Cell Institute. Each line encompasses an FTD mutant version and the corresponding CRISPR-corrected isogenic control. These iPSC lines were differentiated into cerebral organoids and subjected to single-cell RNA sequencing at 2, 4, and 6 months.
We primarily analyze cell lines ND
and GIH6
, as experimental groups within line GIH7
are less well-defined.
Cell Line Key
Line | Wildtype/mutant |
---|---|
ND-B06 | Wildtype |
ND-B09 | Mutant |
GIH6-E11 | Wildtype |
GIH6-A02 | Mutant |
GIH7-F02 | Wildtype |
GIH7-B12 | Wildtype |
GIH7-A01 | Mutant |
In the dataset, "wildtype" and "mutant" are denoted by "V337V" and "V337M", respectively.
Cell types represented
Bowles et al. clustered the single cells of this data set into 16 primary neuronal cell type clusters. Cell types were identified by both automated and expert cell type annotation. The identified categories are represented as follows:
Cell Type Key
- Ast - astrocytes
- ExDp1- excitatory deep layer 1
- ExDp2- excitatory deep layer 2
- ExM- maturing excitatory
- ExM-U- maturing excitatory upper enriched
- ExN- newborn excitatory
- Glia- unspecified glia/non-neuronal cells
- InCGE- interneurons caudal ganglionic eminence
- InMGE- interneurons medial ganglionic eminence
- IP- intermediate progenitors
- OPC- oligodendrocyte precursor cells
- oRG- outer radial glia
- PgG2M- cycling progenitors (G2/M phase)
- PgS- cycling progenitors (S phase)
- UN- unspecified neurons
- vRG- ventricular radial glia.
References
-
Bowles, K. R., Silva, M. C., Whitney, K., Bertucci, T., Berlind, J. E., Lai, J. D., ... & Temple, S. (2021). ELAVL4, splicing, and glutamatergic dysfunction precede neuron loss in MAPT mutation cerebral organoids. Cell, 184(17), 4547-4563. https://doi.org/10.1016/j.cell.2021.07.003
-
Neural Stem Cell Institute Tau Consortium: https://www.neuralsci.org/tau
-
Frontotemporal dementia information: https://www.hopkinsmedicine.org/health/conditions-and-diseases/dementia/frontotemporal-dementia
Major software and analysis tools featured
* `Seurat`, `R` Toolkit for Single-Cell Genomics: https://satijalab.org/seurat/
* `g:profiler` for Gene Ontology enrichment analysis
* `rrvgo` for Revigo Gene Ontology semantic summarization and visualization
* `ReactomeGSA` for Performing a gene set analysis using the Reactome Analysis System
* `STRINGDB` for mapping Protein-protein Interactions within gene sets