diff --git a/StudentData/v1_Data_Introduction.md b/StudentData/v1_Data_Introduction.md index 757ffab..1923fcc 100644 --- a/StudentData/v1_Data_Introduction.md +++ b/StudentData/v1_Data_Introduction.md @@ -12,6 +12,8 @@ There is both meta data and feature data included in LIBS, since their are no ot - *Sol*: Numeric, integers > 0. The Mars day (since start of mission) that the rover took the LIBS point was taken. - *Lat*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken. - *Lon*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken. +- *Type*: Character. What is it a scan of? (Earth calibration, Sample, other?) +- *Earth_Sample*: Binary, indicating whether the target is an earth sample (1) or not an earth sample (0). **Feature data** (numeric) is the same as PIXL, concentration of elemental compounds, though without a few of the elemental compounds PIXL includes. diff --git a/StudentData/v1_consistent_data_naming.Rmd b/StudentData/v1_consistent_data_naming.Rmd index bb295b2..f44d749 100644 --- a/StudentData/v1_consistent_data_naming.Rmd +++ b/StudentData/v1_consistent_data_naming.Rmd @@ -21,6 +21,7 @@ pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds") # Importing LIBS libs.df <- readRDS("~/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") +libs_type.df <- readRDS("~/DAR-Mars-F24/StudentData/libs_typed.Rds") # Importing Lithology lithology.df<- readRDS("~/DAR-Mars-F24/Data/mineral_data_static.Rds") @@ -52,9 +53,11 @@ colnames(pixl.df) <- c("Lat","Lon","Sol","Sample", "Na2O","MgO","Al2O3","SiO2","P2O5","SO3","Cl","K2O","CaO","TiO2","Cr2O3","MnO","FeOT", "Name","Type","Campaign","Location","Abrasion") # Renaming LIBS +libs.df <- cbind(libs.df,libs_type.df$"type",libs_type.df$"earthsample?") colnames(libs.df) <- c("Sol","Lat","Lon","Target","Point", "SiO2","SiO2_stdev","TiO2","TiO2_stdev","Al2O3","Al2O3-stdev","FeOT","FeOT_stdev","MgO","MgO_stdev","CaO","CaO_stdev","Na2O","Na2O_stdev","K2O","K2O_stdev", - "Total","distance_mm","Tot.Em.") + "Total","distance_mm","Tot.Em.", + "Type","Earth_Sample") # Renaming Lithology colnames(lithology.df) <- c("Sample","Name","SampleType","Campaign","Abrasion", "Feldspar","Plagioclase","Pyroxene","Olivine","Quartz", @@ -100,7 +103,7 @@ pixl.df <- pixl.df[,c("Sample", "P2O5","SO3","Cl","Cr2O3","MnO" #These ones don't show up in LIBS )] # Resorting LIBS columns -libs.df <- libs.df[,c("Target","Point","Sol","Lat","Lon", +libs.df <- libs.df[,c("Target","Point","Sol","Lat","Lon","Type","Earth_Sample", "SiO2","SiO2_stdev","TiO2","TiO2_stdev","Al2O3","Al2O3-stdev","FeOT","FeOT_stdev","MgO","MgO_stdev","CaO","CaO_stdev","Na2O","Na2O_stdev","K2O","K2O_stdev", "Total","distance_mm","Tot.Em.")] # Resorting Lithology columns diff --git a/StudentData/v1_libs.Rds b/StudentData/v1_libs.Rds index 64657a5..1cdf67e 100644 Binary files a/StudentData/v1_libs.Rds and b/StudentData/v1_libs.Rds differ diff --git a/StudentNotebooks/Assignment05/roberd10_assignment05.Rmd b/StudentNotebooks/Assignment05/roberd10_assignment05.Rmd new file mode 100644 index 0000000..6efc19d --- /dev/null +++ b/StudentNotebooks/Assignment05/roberd10_assignment05.Rmd @@ -0,0 +1,258 @@ +--- +title: "DAR F24 Assignment 5 Notebook" +author: "Doña Roberts" +date: "`r Sys.Date()`" +output: + pdf_document: + toc: yes + html_document: + toc: yes +subtitle: "MARS" +--- +```{r setup, include=FALSE} + +# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!! +# This section install packages if they are not already installed. +# This block will not be shown in the knit file. +knitr::opts_chunk$set(echo = TRUE) + +# Set the default CRAN repository +local({r <- getOption("repos") + r["CRAN"] <- "http://cran.r-project.org" + options(repos=r) +}) + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} + +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} + +if (!require("knitr")) { + install.packages("knitr") + library(knitr) +} + +``` + +## Weekly Work Summary + +* RCS ID: Roberd10 +* Project Name: MARS +* Summary of work since last week + + * Creating an Rmd file that creates an Rds file containing the minerals and various aspects of them + * Turns out David made a different version of this with more information. Once he's merged that, we should use it instead. + * Creating an Rmd file that compares samples that have been seperated by whether or not a type of mineral is present + * Creating an Rmd file that creates Rds for PIXL, LIBS, LITHOLOGY, & SHERLOC with a consistent naming scheme + * Starting to create a wireframe + +* Summary of github issues added and worked + + * Make the data frames have consistent names have the same naming format (#138) + * Issue created and completed by me + * Forgot to make issues for the rest... + +* Summary of github commits + + * roberd10 + * Student Data merge 1 + * v1_consistent_data_naming.Rmd : "Created file to make Rds with consistent naming schemes" (10/16/24) + * v1_libs.Rds : "Libs data: reordered and renamed" (10/16/24) + * v1_lithology.Rds : "Lithology data: reordered, renamed, and without meta data" (10/16/24) + * v1_pixl.Rds : "Pixl data: reordered, renamed, and without meta data" (10/16/24) + * v1_sample_meta.Rds : "Samples meta data Rds created" (10/16/24) + * v1_sherloc.Rds : "Sherloc data: reordered, renamed, and without meta data" (10/16/24) + * README.md : "Added descriptions of v1 rds files" (10/16/24) + * v1_Data_Introduction.md : "Created a markdown that explains what the v1 Rds files are" (10/17/24) + * Student Data merge 2 + * v1_consistent_data_naming.Rmd : "Fixed columns classes" (10/22/24) + * v1_libs.Rds : "Fixed columns classes" (10/22/24) + * v1_lithology.Rds : "Fixed columns classes" (10/22/24) + * v1_pixl.Rds : "Fixed columns classes" (10/22/24) + * v1_sample_meta.Rds : "Fixed columns classes" (10/22/24) + * v1_sherloc.Rds : "Fixed columns classes" (10/22/24) + * v1_Data_Introduction.md : "Fixed columns classes" (10/22/24) + * Student Data merge 3 + * README.md : "Fixed error: v1 was written as v2" (10/23/24) + * v1_consistent_data_naming.Rmd : "Added pixl and libs connection data frame to v1" (10/23/24) + * v1_Data_Introduction.md : "Added description of libs_to_pixl.Rds" (10/23/24) + * v1_libs_to_sample.Rds : "Created v1 for PIXL_LIBS_Combined.Rds" (10/23/24) + * v1_consistent_data_naming.Rmd : "Fixed error where lithology sample numbers were shuffled" (10/23/24) + * v1_lithology.Rds : "Fixed error where lithology sample numbers were shuffled" (10/23/24) + * Mineral Classes merge 4 + * mineral_classes.Rmd : "Creating Rmd that creates Rds containing minerals and their features" (11/5/24) + * v1_mineral_classes.Rds : "Creating Rds that contains minerals and their features" (11/5/24) + * README.md : "Adding description of mineral_classes" (11/5/24) + * Assignment 5 merge 5 + * roberd10_assignment05.Rmd : "Assignment 5" (11/6/24) + * roberd10_assignment05.pdf : "Assignment 5" (11/6/24) + +## Personal Contribution + +* Creating an Rmd file that creates Rds for PIXL, LIBS, LITHOLOGY, & SHERLOC with a consistent naming scheme +* Creating an Rmd file that compares samples that have been selected based on type of minerals present +* Creating an Rmd file that creates Rds with Mineral names and classification +* Starting on a potential wireframe for an "about" section + +## Analysis: Mineral Classes + +### Question being asked + +What Minerals are grouped together by various features (Carbonates, Oxides, Aqueous, etc)? +How would we analyze the samples based on this? + +(All the code in this section is also in an Rmd file called "Mineral_Selection.Rmd" that I'll push to GitHub once I know what folder it should be in) + +### Data Preparation + +Created an Rmd file in [StudentData](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentData) ("mineral_classes.Rmd") that creates an Rds file ("v1_mineral_classes.Rds") with the mineral names and a selectable (and extendable!) list of their features. + +The following two sections (Importing and Selecting) are taken from "Mineral_Selection.Rmd". + +#### Importing +First, importing the relevant data. Note that we are importing the new dataset called "mineral_classes". This is the data connecting minerals based on various features. It turns out that David made his own version of this that included Aqueous information, so once he's pushed that we should replace my "mineral_classes.df" with his version (whatever it is named). + +```{r} +# Importing +sample_meta.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_sample_meta.Rds") +sample_meta.df <- sample_meta.df[1:16,] +pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds") +pixl.df[,-1] <- as.data.frame(scale(pixl.df[,-1])) +lithology.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_lithology.Rds") +lithology.df <- lithology.df[1:16,] +mineral_classes.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_mineral_classes.Rds") +``` + +##### Selecting +In this section, I am selecting a feature to split the minerals by. In this case, I've chosen to do "Oxide" vs "Not Oxide". It is easy to change what it is being selected by just by changing what "Selector" is set as. +```{r} +# Select by +Selector <- "Oxide" +# Possible things to select by at this time: +# Ates: "Sulfate", "Perchlorate", "Silicate", "Phosphate", "Carbonate" +# Ites: "Apatite", "Halite", "Chlorite", "Kaolinite", "Ilmenite" +# Other: "Oxide" +``` + +Once I've decided *what* to select by, I actually *select* by it. +```{r} +minerals <- mineral_classes.df$Type == Selector +``` + +Now that I have the list of minerals that have this feature, I select only those columns in Lithology. + +```{r} +minerals <- append(TRUE,minerals) +lithology.df <- lithology.df[,minerals] +``` + +With this limited version of Lithology, I sum the rows for each sample and take only the samples who have more than 0. That is to say, I take all the samples that have at least one mineral with this feature present. This is my "Present" cluster of samples. The rest of he samples are put in the "Absent" cluster. + +```{r} +# Resulting Samples +lithology.df[,-1] <- lapply(lithology.df[,-1],as.character) +lithology.df[,-1] <- lapply(lithology.df[,-1],as.numeric) +quantity <- rowSums(lithology.df[,-1]) +clustering <- as.integer(quantity > 0) +Presence <- recode_factor(as.factor(clustering), "0" = "Absent", "1" = "Present") +``` + +### Analysis: Methods and results + +Below is the analysis I've already put in "Mineral_Selection.Rmd". I will be continuing to add analysis to this, this is just the bare bones level I've done so far. + +#### Mapping +First of all, we'll map the samples based on their coordinates and color based on whether (in this example) Oxides are present or absent. + +Note that the title has the variable "Selector" in it, so if you change what you are selecting by it changes the title. + +Also note that the samples are labeled by their *Abrasion*, this is because samples at the same abrasion have identical locations and thus overlap. Luckily, at least when selecting by Oxide, samples in the same abrasion are clustered together. I will need to figure out what to do when that's not the case. + +```{r} +map_title <- paste("Map of samples based on location and colored by presence of",Selector) +ggplot(data = sample_meta.df, aes(Lon,Lat)) + + ggtitle(label = map_title) + + geom_point(aes(colour = Presence)) + + geom_text(aes(label = Abrasion), nudge_x = 0.004, size = 2) +``` +For Oxides there doesn't appear to be an obvious locational clustering, but that might be different when you select by other things. + +#### Elemental Compound Concentration Difference + +Next, we of course have to look at how the mean concentration of elemental compounds differ when minerals with our (in this example, Oxides) feature are present or absent. + +First we calculate the mean elemental compound concentrations for each group. Then we plot it on a heatmap. +Note that we scaled the Pixl data by column when we imported it, this is because we just want to see how each individual elemental compounds concentration varies between samples with our selector and sites without, *not* the variation between concentration of elemental compounds at a sample. + +```{r} +Present <- colMeans(pixl.df[clustering,-1]) +Absent <- colMeans(pixl.df[-clustering,-1]) +means <- rbind(Present,Absent) +heat_title <- paste("Mean concentrations when grouped by presence of",Selector) +pheatmap(means,cluster_rows = FALSE, cluster_cols = FALSE, main = heat_title) +``` + +For Oxides, it's clear that samples with Oxides present had much higher concentrations of Na2O and CaO and much lower concentrations of MgO. +then samples without Oxides. + +I'm currently looking into whether there is an error with this heatmap, since no matter what I select by it looks exactly the same. It might be due to scaling, an error in my calculation, or no error at all. + +The "Absent" row being all about "0" is also a little odd, though that might be because the scaling centered the data. + +### Discussion of results + +This is just the start of the analysis that could be done after splitting sample sites into those that contain a type of mineral and those that don't. The hope is to have some of the more in depth analysis done by the others included here as well, but I haven't gotten to that yet. Essentially, this is a work in progress. + +## Fix column names of the data frames + +Created an Rmd file in StudentData ("v1_consistent_data_naming.Rmd") that creates a new Rds file for the four data sets as well as creating an Rds file of the meta data for the sample data. + +Created a markdown file ("v1_Data_Introduction.md") that describes what the new Rds files contain. + +These Rds files are the same data as the original data sets, but reorganized, renamed, and properly classed (numeric/factor/character) so that they are consistent and accurate. + +Multiple iterations pushed to github as I add the data sets made by Margo and Charlotte or fix issues brought up during discussion. + +Link to [Github StudentData](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentData) + +## Creating a Wireframe + +### Question being asked + +How should we display the information about what the Sample and target data's are? + +### Wireframe + +Created a possible wireframe for the description of the data. My attention was shifted to the presentation for Karen Rogers and making a way to compare samples grouped by presence of types of minerals so the wireframe is incomplete/half-done. Additionally, it was brought up that since the Mission Minder is for people who already have the background knowledge, providing it isn't a priority. + +[*Wireframe Slides*](https://docs.google.com/presentation/d/1QVX61d2LxmP8Fj_M0L-ZLxQBjvYRjSS_rnh4Fkt6z2g/edit#slide=id.p) + +## Summary and next steps + +Most of my work was on data cleaning / organizing, and a little bit on the Mission Minder. + +My attention will be shifting towards creating pages for the Mission Minder, as was instructed during recent meetings. diff --git a/StudentNotebooks/Assignment05/roberd10_assignment05.pdf b/StudentNotebooks/Assignment05/roberd10_assignment05.pdf new file mode 100644 index 0000000..e22c5a1 Binary files /dev/null and b/StudentNotebooks/Assignment05/roberd10_assignment05.pdf differ