diff --git a/StudentData/LIBS_calibration_targets.Rds b/StudentData/LIBS_calibration_targets.Rds new file mode 100644 index 0000000..df90460 Binary files /dev/null and b/StudentData/LIBS_calibration_targets.Rds differ diff --git a/StudentData/PIXL_LIBS_Combined.Rds b/StudentData/PIXL_LIBS_Combined.Rds index 753cb0b..17bdf04 100644 Binary files a/StudentData/PIXL_LIBS_Combined.Rds and b/StudentData/PIXL_LIBS_Combined.Rds differ diff --git a/StudentData/README.md b/StudentData/README.md index 567a233..aadeef6 100644 --- a/StudentData/README.md +++ b/StudentData/README.md @@ -16,14 +16,16 @@ _You may need to use a more creative path name to read an Rds in the StudentData # Rds file introductions -**pixl_sol_coordinates.Rds** has all of the data in samples_pixl_wide.Rds with the latitude, longitude and sol found from the analysts notebook. Note that the Salette and Bearwallow samples have zeros as their latitude and longitude because the analysts notebook was giving errors for these sites. Following the conventions of this data set, though, most likely Salette has the same coordinates as Coulettes, and Bearwallow has the same coordinates as Hazeltop. +**pixl_sol_coordinates.Rds** has all of the data in samples_pixl_wide.Rds with the latitude, longitude and sol found from the analysts notebook. -**libs_v2** all the libs data with the columns renamed (meta data capitalized) so that they match with the other datasets and reordered to match with other data sets. +**libs_v1** all the libs data with the columns renamed (meta data capitalized) so that they match with the other datasets and reordered to match with other data sets. -**pixl_v2** all the pixl data with the columns renamed and reordered to match libs. +**pixl_v1** all the pixl data with the columns renamed and reordered to match libs. -**sherloc_v2** all the sherloc data, but after it's been turned into a data frame in the same format as sherloc. +**sherloc_v1** all the sherloc data, but after it's been turned into a data frame in the same format as sherloc. -**lithology_v2** all the lithology data with the columns renamed and reordered to match sherloc. +**lithology_v1** all the lithology data with the columns renamed and reordered to match sherloc. **sample_meta** all the meta data for the samples. This can be appended to pixl, sherloc, and lithology. + +**libs_typed.Rds** the libs data with a "type" column added that contains descriptors of the scct samples, as well as other sample descriptors from the analysts notebook diff --git a/StudentData/libs_typed.Rds b/StudentData/libs_typed.Rds new file mode 100644 index 0000000..eb09ee2 Binary files /dev/null and b/StudentData/libs_typed.Rds differ diff --git a/StudentData/v1_Data_Introduction.md b/StudentData/v1_Data_Introduction.md index 1f0acec..757ffab 100644 --- a/StudentData/v1_Data_Introduction.md +++ b/StudentData/v1_Data_Introduction.md @@ -2,18 +2,18 @@ All data sets are reordered such that meta data is capitalized and at the front of the data frame and ____ data is at the end. Additionally, the order and names of elemental compound in PIXL/LIBS and minerals in Lithology/Sherloc have been made to match. -# Libs -There is both meta data and feature data included in Libs, since their are no other data sets that use match the targets and thus no point in seperating it out. +# LIBS +There is both meta data and feature data included in LIBS, since their are no other data sets that use match the targets and thus no point in separating it out. **Meta data**: -- *Target*: Factor, name of location the libs was taken. -- *Point*: Numeric, intergers 1 through 28. The supercam is a semi-grid (28 dots, or "points", in rows of 6,5,6,5,6), and so for each target there are 28 "points" taken. -- *Sol*: Numeric, integers > 0. The Mars day (since start of mission) that the rover took the libs point was taken. -- *Lat*: Numeric. Part of the location data. Where the *rover* was when the libs point was taken. -- *Lon*: Numeric. Part of the location data. Where the *rover* was when the libs point was taken. +- *Target*: Character, name of location the LIBS was taken. +- *Point*: Factor, intergers 1 through 28. The supercam is a semi-grid (28 dots, or "points", in rows of 6,5,6,5,6), and so for each target there are 28 "points" taken. +- *Sol*: Numeric, integers > 0. The Mars day (since start of mission) that the rover took the LIBS point was taken. +- *Lat*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken. +- *Lon*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken. -**Feature data** is the same as PIXL, concentration of elemental compounds, though without a few of the elemental compounds PIXL includes. +**Feature data** (numeric) is the same as PIXL, concentration of elemental compounds, though without a few of the elemental compounds PIXL includes. Main change made to the LIBS data set is that the elemental compounds originally had "o" (lowercase letter o) to represent "Oxygen" has been changed to "O" (uppercase letter O). @@ -23,7 +23,9 @@ Sample data is the data for the, currently 16, samples. There are four data fram ## Sample Meta A data frame containing all the meta data for the samples. This can be appended to Pixl, Lithology, or Sherloc based on "Sample". -**Sample**: Numeric, integers 1 through 24. The sample #, 1 through 16. Note that this sample number does *not* match the sample numbers in the analyst notebook and initial reports, because they also count witness blank (control) samples in their count (ex: our "16" is their "18" because they have two witness blank samples before it). +**Sample**: Integer, 1 through 24. The sample #, 1 through 16. Note that this sample number does *not* match the sample numbers in the analyst notebook and initial reports, because they also count witness blank (control) samples in their count (ex: our "16" is their "18" because they have two witness blank samples before it). + +**Name**: Character, unique. Just what they decided to refer to the sample as. Also used in analyst notebook and initial reports. **Sol**: Numeric, integers > 0. The Mars day (since start of mission) that the sample was taken @@ -31,8 +33,6 @@ A data frame containing all the meta data for the samples. This can be appended **Lon**: Numeric. Part of the location data. Where the smaple was taken. -**Name**: Factor, unique. Just what they decided to refer to the sample as. Also used in analyst notebook and initial reports. - **Abrasion**: Factor, some duplication. This is the name they give to the bit of rock they abraided, used to indicate when two samples came from the same spot. @@ -48,11 +48,24 @@ Starts with the sample number. Following is all numeric elemental compound conce One change made to the PIXL data set is that the elemental compounds originally had "0" (zero) to represent "Oxygen" has been changed to "O" (uppercase letter O). ## Lithology -Starts with the sample number. Following columns are all binary data (0 or 1) representing the presence or absence of minerals. +Starts with the sample number. Following columns are all factor data (binary 0 or 1) representing the presence or absence of minerals. Main change to Lithology data set was the renaming and ordering of the minerals to match the sherloc data set. ## Sherloc -Starts with the sample number. Following columns are all "numeric" data (0 through 1) representing the certainty in the presence of minerals. This is discrete data though, with the only options being "0" (No detection), "0.25" (Possible but not confirmed), "0.5" (Almost certain, working on confirming), "0.75" (?), and "1" (confirmed). +Starts with the sample number. Following columns are all factor data representing the certainty in the presence of minerals. This is discrete data though, with the only options being "0" (No detection), "0.25" (Possible but not confirmed), "0.5" (Almost certain, working on confirming), "0.75" (?), and "1" (confirmed). Main change made to the SHERLOC data set is that in v1 it has already been converted into the same data frame format as the other data sets. + +# LIBS to PIXL data set +A data set connecting libs points and pixl samples based on lat/lon. + +Appended ".libs" to end of the LIBS meta data column names and ".pixl" to end of the PIXL meta data column names so that it is, at a glance, transparent which refers to which. + +Contains: + +**Target.libs**, **Point.libs**, **Sol.libs**, **Lat.libs**, **Lon.libs**: Exactly the same as the ones from LIBS + +**Sample.pixl**, **Sol.pixl**, **Lon.pixl**, **Abrasion.pixl**, **Campaign.pixl**, **Type.pixl**: "NA" if there is not a correlated pixl sample for the LIBS point. If a pixl Sample *does* correlate (based on lat/lon), then it's the same as the data in v1_sample_meta.Rds for that sample number. + +Note that "Lat.pixl" is missing. This is not intentional, and should hopefully be fixed soon. \ No newline at end of file diff --git a/StudentData/v1_consistent_data_naming.Rmd b/StudentData/v1_consistent_data_naming.Rmd index ad329fa..bb295b2 100644 --- a/StudentData/v1_consistent_data_naming.Rmd +++ b/StudentData/v1_consistent_data_naming.Rmd @@ -40,6 +40,9 @@ sherloc.df$Name <- as.factor(sherloc.df$Name) ## Make it a matrix sherloc.matrix <- sherloc.df %>% pivot_wider(names_from = Mineral, values_from = Presence) sherloc.df <- cbind(sherloc.matrix,pixl.df[,"sample"]) + +# pixl and libs combined data frame +pixl_libs.df <- readRDS("PIXL_LIBS_Combined.Rds") ``` # Renaming Columns @@ -72,6 +75,9 @@ colnames(sherloc.df) <- c("Name", "Organic matter","Sulfate+Organic matter","Other hydrated phases","Phyllosilicates", "Chlorite","Kaolinite (hydrous Al-clay)","Chromite","Ilmenite", "Zircon/Baddeleyite","Fe-Mg-clay minerals","Spinels","Sample") + +# Renaming Pixl and Libs combined data set +colnames(pixl_libs.df) <- c("Sol.libs","Lat.libs","Lon.libs","Target.libs","Point.libs","Lon.pixl","Sol.pixl","Sample.pixl","Name.pixl","Type.pixl","Campaign.pixl","Location.pixl","Abrasion.pixl") ``` # Creating Sample metadata data frame @@ -80,7 +86,7 @@ colnames(sherloc.df) <- c("Name", sample_meta.df <- qpcR:::cbind.na(pixl.df[,c("Sol","Lat","Lon","Type","Campaign","Abrasion","Name","Location")], lithology.df[,c("Sample","SampleType")]) # Reordering it -sample_meta.df <- sample_meta.df[,c("Sample","Sol","Lat","Lon","Name","Abrasion","Campaign","Type","SampleType")] +sample_meta.df <- sample_meta.df[,c("Sample","Name","Sol","Lat","Lon","Abrasion","Campaign","Type","SampleType")] # Changing atmospherics type from "N/A" to "Atmospheric" sample_meta.df[1,"Type"] <- "Atmospheric" @@ -119,8 +125,48 @@ sherloc.df <- sherloc.df[,c("Sample", "Organic matter","Sulfate+Organic matter","Other hydrated phases","Phyllosilicates", "Chlorite","Kaolinite (hydrous Al-clay)","Chromite","Ilmenite", "Zircon/Baddeleyite","Fe-Mg-clay minerals","Spinels")] + +# Resorting Pixl and Libs combined data set +pixl_libs.df <- pixl_libs.df[,c("Target.libs","Point.libs","Sol.libs","Lat.libs","Lon.libs", + "Sample.pixl","Name.pixl","Sol.pixl","Lon.pixl","Abrasion.pixl","Campaign.pixl","Type.pixl")] + +``` + +Check types and fix them (ex Sample, Sol, Lat, Lon -> numeric, Name -> character, Abrasion, Campaign, Type, SampleType -> Factor) +# Fixing data types +```{r} +# Pixl +## Already good! +## Sample is integer and concentrations are numeric! + +# Libs +libs.df$Point <- as.factor(libs.df$Point) # Was originally "character" + +# Lithology +lithology.df[,2:36] <- lapply(lithology.df[,2:36],as.factor) # Was originally "character" +lithology.df$Sample <- as.integer(lithology.df$Sample) #To match Pixl + +# Sherloc +sherloc.df[] <- data.frame(lapply(sherloc.df[],as.factor)) # Was originally "character" +sherloc.df$Sample <- as.integer(sherloc.df$Sample) # Back to original, since prior line changed it + +# Sample Meta +sample_meta.df$Sample <- as.integer(sample_meta.df$Sample) +# sample_meta.df$Name <- as.character(sample_meta.df$Name) # Already in the format! +sample_meta.df$Sol <- as.numeric(sample_meta.df$Sol) +sample_meta.df$Lat <- as.numeric(sample_meta.df$Lat) +sample_meta.df$Lon <- as.numeric(sample_meta.df$Lon) +sample_meta.df$Abrasion <- as.factor(sample_meta.df$Abrasion) +sample_meta.df$Campaign <- as.factor(sample_meta.df$Campaign) +sample_meta.df$Type <- as.factor(sample_meta.df$Type) +sample_meta.df$SampleType <- as.factor(sample_meta.df$SampleType) + +# Pixl and Libs combined +## Already good! + ``` + # Saving New data frames ```{r} saveRDS(sample_meta.df, "v1_sample_meta.Rds") @@ -128,4 +174,5 @@ saveRDS(libs.df, "v1_libs.Rds") saveRDS(lithology.df, "v1_lithology.Rds") saveRDS(sherloc.df, "v1_sherloc.Rds") saveRDS(pixl.df, "v1_pixl.Rds") +saveRDS(pixl_libs.df, "v1_libs_to_sample.Rds") ``` \ No newline at end of file diff --git a/StudentData/v1_libs.Rds b/StudentData/v1_libs.Rds index c90d68b..64657a5 100644 Binary files a/StudentData/v1_libs.Rds and b/StudentData/v1_libs.Rds differ diff --git a/StudentData/v1_libs_to_sample.Rds b/StudentData/v1_libs_to_sample.Rds new file mode 100644 index 0000000..711ce9d Binary files /dev/null and b/StudentData/v1_libs_to_sample.Rds differ diff --git a/StudentData/v1_lithology.Rds b/StudentData/v1_lithology.Rds index dd22b91..04f1928 100644 Binary files a/StudentData/v1_lithology.Rds and b/StudentData/v1_lithology.Rds differ diff --git a/StudentData/v1_sample_meta.Rds b/StudentData/v1_sample_meta.Rds index 219183b..de902e4 100644 Binary files a/StudentData/v1_sample_meta.Rds and b/StudentData/v1_sample_meta.Rds differ diff --git a/StudentData/v1_sherloc.Rds b/StudentData/v1_sherloc.Rds index 4e3f395..5d3382d 100644 Binary files a/StudentData/v1_sherloc.Rds and b/StudentData/v1_sherloc.Rds differ diff --git a/StudentNotebooks/Assignment03/walczd3_assignment03.Rmd b/StudentNotebooks/Assignment03/walczd3_assignment03.Rmd new file mode 100644 index 0000000..e15cfcd --- /dev/null +++ b/StudentNotebooks/Assignment03/walczd3_assignment03.Rmd @@ -0,0 +1,628 @@ +--- +title: "DAR F24 Assignment 3 Notebook Template" +author: "David Walczyk" +date: "`r Sys.Date()`" +output: + html_document: + toc: yes + pdf_document: + toc: yes +subtitle: "DAR Project Name: Mars" +--- + + +## BiWeekly Work Summary + +**NOTE:** Follow an outline format; use bullets to express individual points. + +* RCS ID: **Always** include this! +* Project Name: **Always** include this! +* Summary of work since last week + + * Describe the important aspects of what you worked on and accomplished + + +* Summary of github commits + + * include branch name(s) + * include browsable links to all external files on github + * Include links to shared Shiny apps + +* List of presentations, papers, or other outputs + + * Include browsable links + +* List of references (if necessary) +* Indicate any use of group shared code base +* Indicate which parts of your described work were done by you or as part of joint efforts + +* **Required:** Provide illustrating figures and/or tables + +## Personal Contribution + +* Clearly defined, unique contribution(s) done by you: code, ideas, writing... +* Include github issues you've addressed if any + +_PACKAGES_ + +```{r} + +# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!! +# This section install packages if they are not already installed. +# This block will not be shown in the knit file. +knitr::opts_chunk$set(echo = TRUE) + +# Set the default CRAN repository +local({r <- getOption("repos") + r["CRAN"] <- "http://cran.r-project.org" + options(repos=r) +}) + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} + +if (!require("ggplotify")) { + install.packages("ggplotify") + library(ggplotify) +} + +if (!require("car")) { + install.packages("car") + library(car) +} + +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} +if (!require("apcluster")) { + install.packages("apcluster") + library(apcluster) +} +if (!require("vegan")) { + install.packages("vegan") + library(vegan) +} +if (!require("ape")) { + install.packages("ape") + library(ape) +} +if (!require("Matrix")) { + install.packages("Matrix") + library(Matrix) +} + +if (!require("gridExtra")) { + install.packages("gridExtra") + library(gridExtra) +} + +if (!require("umap")) { + install.packages("umap") + library(umap) +} + +if (!require("ggtern")) { + install.packages("ggtern") + library(ggtern) +} +``` + +_LOAD IN DATA_ + +```{r, result01_data} + +#-------------LIBS------------------- +# Load the saved LIBS data with locations added +libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") +libs.std_dev <- libs.df %>% + select((c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) +libs.df <- libs.df %>% + select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) + +# Convert the points to numeric +libs.df$point <- as.numeric(libs.df$point) + +# Review what we have +summary(libs.df) + +#----------PIXL---------------------- +# Load the saved PIXL data with locations added +pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") + +pixl.df +# Convert to factors +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) + +# Review our dataframe +summary(pixl.df) + +#----------SHERLOC---------------------- +# Read in data as provided. +sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds") + +# Clean up data types +sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral) +sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], + as.numeric) +# Transform NA's to 0 +sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0) + +# Reformat data so that rows are "abrasions" and columns list the presence of minerals. +# Do this by "pivoting" to a long format, and then back to the desired wide format. + +sherloc_long <- sherloc_abrasion_raw %>% + pivot_longer(!Mineral, names_to = "Name", values_to = "Presence") + +# Make abrasion a factor +sherloc_long$Name <- as.factor(sherloc_long$Name) + +# Make it a matrix +sherloc.matrix <- sherloc_long %>% + pivot_wider(names_from = Mineral, values_from = Presence) + +# Get sample information from PIXL and add to measurements -- assumes order is the same + +sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix) + +# Review what we have +summary(sherloc.df) + + +# Load the saved lithology data with locations added +lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds") + +# Cast samples as numbers +lithology.df$sample <- as.numeric(lithology.df$sample) + +# Convert rest into factors +lithology.df[sapply(lithology.df, is.character)] <- + lapply(lithology.df[sapply(lithology.df, is.character)], + as.factor) + +# Keep only first 16 samples because the data for the rest of the samples is not available yet +lithology.df<-lithology.df[1:16,] + +# Create a matrix containing only the numeric measurements. The remaining features are metadata about the sample. +lithology.matrix <- sapply(lithology.df[,6:40],as.numeric)-1 + +# Review the structure of our matrix +str(lithology.matrix) +``` + + + + +## Analysis: Question 1 (Provide short name) + +### Question being asked + +_Are there any similarities or unusual trends in our LIBS data that we can compare to rover collected PIXL samples?_ + +### Data Preparation + +* For analysis #1 I will be analyzing the differences in LIBS and PIXL data exclusively. LIBS data contains only 8 numerical features, all of which are contained in the PIXL data. 4 features are unique to the PIXL samples; P2O5, Cl, Cr203, and MnO. Therefore, aside from metadata like campaign (which will be included later) or rock type, I am preparing the data to analyze shared features. + +* The first step will be applying a variety of clustering algorithms including Affinity Propagation (AP), k-means, & Uniform Manifold Approximation and Projection (UMAP). I apply these because I'm interested to see if under different supervised and unsupervised clustering methods, do similar features or clusters arise. After clustering, to see where shared oxide features are most correlated I'll plot a PCA with the most prominent clustering technique on LIBS data and compare the eigenvectors to that of the PIXL PCA. + +* LIBS & PIXL datasets exclusively for this question. + + +### Analysis: Methods and results + +_Applying clustering algorithms to detect distinct clusters._ + + +```{r, result01_analysis} +#https://books.google.com/books?hl=en&lr=&id=spQ7FWsRX30C&oi=fnd&pg=PA3&dq=sedimentary+rocks&ots=T0fThFnYqm&sig=GbZZW_JuHjm9VmebYKaP1IRzSD8#v=onepage&q=sedimentary%20rocks&f=false +#https://www.osti.gov/servlets/purl/1409785 - LIBS +# - LIBS data is data not directly sampled by the Rover and its abrasion tool. Instead it is found by a projected laser from the SUPERCAM instrument that points at a specific rock and is able to distinguish the specified Polyatomic ions by wavelength intensity. + + +#PIXL samples lat and lon +samples <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples.Rds") +sample_coord <- samples[which(samples$name %in% pixl.df$name),c(3:5)] + +shared_pixl <- pixl.df[,c(5, 11, 4, 14, 3, 10, 2, 9, 15, 17)] +names(shared_pixl)[1:8] <- names(libs.df[,6:13]) +shared_libs <- cbind(libs.df[,6:13], lat = libs.df[,2], lon = libs.df[,3], point = libs.df[,5]) #features, lat, long, point(?) + +#AP Clustering on shared data +set.seed(4) +get_ap <- function( data) { + ap <- apcluster(negDistMat(r = 2), data, q = 0.001) + clusters <- ap@clusters + ap_clusters <- c(1:16) + for (i in seq(length(clusters))) { + num <- i + for (j in seq(length(clusters[[i]]))) { + ap_clusters[clusters[[i]][[j]]] = num + } + } + return (ap_clusters) +} + +ap_clusters.pixl <- get_ap(shared_pixl[,1:8]) +ap_clusters.libs <- get_ap(shared_libs[,1:8]) + +unique(ap_clusters.pixl) #k = 3 +unique(ap_clusters.libs) #k = 13 + + +#Find k-means cluster +wssplot <- function(data, df, nc=8, seed=4){ + wss <- data.frame(cluster=1:nc, quality=c(0)) + for (i in 1:nc){ + set.seed(seed) + wss[i,2] <- kmeans(data, centers=i)$tot.withinss + } + ggplot(data=wss,aes(x=cluster,y=quality)) + + geom_line() + + ggtitle(paste("Quality of k-means by Cluster -", df)) +} + +wssplot(shared_pixl[,1:8], "PIXL") #k = 3,4 +wssplot(shared_libs[,1:8], "LIBS") #k = 3,4,6 + +umapplot <- function(data, i, df) { + custom.config <- umap.defaults + custom.config$n_neighbors = i + UMAP <- umap(data, config = custom.config) + plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = paste("UMAP[",df,"] on nn = ",i, sep = "")) +} + +#Apply UMAP +#find optimal kNN +#PIXL, with all features under UMAP showed unrecognizable features. Shared features showed 2,4,6 possible clusters but did not really converge as nn rose + +#LIBS +#nn <- seq(5,25, 5) +#for (i in nn) { +# umapplot(shared_libs[,1:8],i, "LIBS") +#} +#nn = 25 for just LIBS data found 4 distinct clusters. One of the clusters right off the bat showed some form of seperation at lower n_neighbors parameters but evidently started to come together as nn increased +custom.config <- umap.defaults +custom.config$n_neighbors = 25 +UMAP <- umap(shared_libs[,1:8], config = custom.config ) +UMAP.data <- UMAP$layout +plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = "UMAP on nn = 25") +abline( a =0, b = -3, h = 0,col = "blue") + +#use line: y = -3x to find which data points are to the left and right of the cluster line. +# y = -3x --> x = y / -3 +x <- UMAP.data[,2] / -3 +plot(x = x, y = UMAP.data[,2]) +umap_clusters <- rep(NA,nrow(UMAP.data)) +umap_clusters[which( (UMAP.data[,1] >= 2) & (UMAP.data[,2] > 0) )] = 1 +umap_clusters[which( (UMAP.data[,1] < 2) & (UMAP.data[,2] > 0) )] = 2 +umap_clusters[which( UMAP.data[,1] > 9)] = 3 #unique cluster isolated +umap_clusters[which( (UMAP.data[,1] < x) & (UMAP.data[,2] < 0)) ] = 4 +umap_clusters[ which( ( (UMAP.data[,1] > x ) & (UMAP.data[,1] < 9) ) & (UMAP.data[,2] < 0))] = 5 +ggplot(data.frame(UMAP.data), aes(x = X1, y = X2, color = as.factor(umap_clusters))) + + geom_point() + + labs(title = "UMAP cluster check: nn = 15") + +#run PCA with umap clustering on libs data + campaign type for pixl of shared features +pca.libs <- prcomp(shared_libs[,1:8], retx = T, center = T) +umap.biplot <- ggbiplot::ggbiplot(pca.libs, + groups = as.factor(umap_clusters), + circle = T, + varname.size = 2, + varname.color = "red", labels.size =8 + ) + theme_bw() + labs(title = "PCA with UMAP clusters") +#total PIXL +pca.pixl <- prcomp(pixl.df[,2:14], retx = T, center = T) +pixl.biplot <- ggbiplot::ggbiplot(pca.pixl, + groups = pixl.df$campaign, + circle = T, + varname.size = 2, + varname.color = "red", labels.size =8 + ) + theme_bw() + labs(title = "PCA with UMAP clusters") + + + + +#plot heatmaps +umap.centers <- data.frame() +unq <- unique(umap_clusters) +for (i in unq) { + arr <- colMeans(shared_libs[which(umap_clusters == i),1:8]) + umap.centers <- rbind(umap.centers, arr) +} +names(umap.centers) <- colnames(shared_libs[,1:8]) + +pixl.centers <- data.frame() +for (i in unique(shared_pixl$campaign)) { + arr <- colMeans(pixl.df[which(shared_pixl$campaign == i), 2:14]) + pixl.centers <- rbind(pixl.centers, arr) +} +names(pixl.centers) <- names(pixl.df)[2:14] + +pixl.centers +u.heatmap <- pheatmap(umap.centers, scale = "none", main = "UMAP heatmap") +pxl.heatmap <- pheatmap(pixl.centers, scale = "none", main = "PIXL-Campaign heatmap", labels_row = unique(shared_pixl$campaign)) + +grid.arrange(pixl.biplot, umap.biplot, ncol = 2) +grid.arrange(as.ggplot(u.heatmap), as.ggplot(pxl.heatmap), ncol = 2) +#look at the pattern differences between high SiO2 concentrations with FeO-T + MgO & Al2O3 + CaO + +``` + +### Discussion of results + +* Campaign type is a powerful indicator of the type of chemical makeup that a sample will have. LIBS data is kind of erratic and while we are getting information that can help classify how each cluster identified by UMAP is related to campaign type, it is hard to say whether or not this information is reliable. In the next analysis I will compare SHERLOC data to my found clusters, especially that of delta front campaigns and the unusually high CaO cluster found. + + +## Analysis: Question 2 (Provide short name) + +### Question being asked + +_According to papers in 1985 and 2010 regarding amelioration (glacial to inter glacial stages) of lake basins and analysis of river basins and its trapped sediments, respectively, an index called the chemical index of alteration (CIA) was used to measure the level of weathering that rocks underwent as a result of chemical reactions (i.e reactions with the water and other dissolved substances). Can we use this index to get accurate information about how silicate-rocks underwent some lasting form of chemical weathering that is indicative of the last stages of water on Mars and its effect on basin and delta front rocks?_ + +### Data Preparation + +* For analysis #2 I will be utilizing the lithography and PIXL dataset to calculate CAI for our samples considering we know the definite campaign and sampling location they were at. + +* The immediate code below is just another way of plotting the concentration (in this case density) of all our features on different clusters. This is arbitrary and was just used as another way of plotting. In my analysis I first plotted all lithology points to find which of the silicates specified in Dr. Roger's lecture 2 were present. I copied these column names so that I could find the relative abundance of silicates in each sample however, this was just a check for PIXL data and as of now has no direct meaning other than associating > 0 or < 0, meaning that silicates with CaO are available. Regardless, for PIXL and LIBS I found the distributions of CIA values for all samples and clusters, respectively. I then looked at the difference between genuine weathering criteria (CIA > 70 indicates some form of weathering occurred; there are specific ranges that differ in the literature so I just wanted to see as a whole whether samples > or < 70 differed). Finally, I plotted a ternary plot of all the values greater than 70 with the respective cation axis'. + +* I will be re-using shared_libs and shared_pixl. To find silicate containing rocks I will be using the lithology.df +```{r, result02_data} +#for each cluster facet_wrap the features but color the cluster in +shared_libs$cluster <- umap_clusters +shared_libs.long <- shared_libs %>% + pivot_longer(cols = names(shared_libs)[1:8], names_to = "Variable", values_to = "Values") +ggplot(shared_libs.long, aes(x = Values, color = as.factor(cluster))) + + geom_density() + + facet_wrap(~Variable, scales = "free") + + labs(title = "Density plots of all variables") +#lat long map +sample_coord$name <- "PIXL" +coords <- rbind(data.frame(name = rep("LIBS", nrow(libs.df)), lat = libs.df$lat, lon = libs.df$lon), sample_coord) +ggplot(data = coords, aes(x = lat, y = lon, color = name)) + + geom_point() + +``` + +### Analysis: Methods and Results + +* I want to see if their is any clear form of chemical weathering between UMAP identified clusters. + + +```{r, result02_analysis} +#Lithology heatmap - so note CaO references only the CaO that is only avaliable in silicate rocks. For this since, we aren't exactly sure of the chemical composition of SHERLOC or lithology data, we just check to see if the samples contain silicates (rowSums[which(silicates)] > 0) +#silicates are found in accordance +pheatmap(lithology.matrix, scale = "none", main = "Lithology heatmap", labels_row = lithology.df$campaign) + +delta.minerals <- c( "Kaolinite", "Hydrated_Mg_Fe_Sulfate", "Fe_Mg_clay", "Mg_sulfate", "Spinels", "Zircon/Baddeleyite", "Chromite", "Ilmenite", "Fe_Mg_carbonate") #Isolated delta minerals; notice the abundance of Mg_Fe minerals +silicates <- c("quartz", "feldspar", "pyroxene", "olivine", "Fe_Mg_clay") +rowSums(lithology.matrix[,silicates]) #all samples satisfied + +#PIXL +CIA.pixl <- (pixl.df$Al203 / (pixl.df$Al203 + pixl.df$Na20 + pixl.df$K20 + pixl.df$Cao)) * 100 +hist(CIA.pixl) + +#LIBS +#do these values differ for each cluster? +CIA.libs <- data.frame() +for (i in unq) { + arr <- shared_libs[which(umap_clusters == i), 1:8] + CIA <- cbind((arr$Al2O3 / (arr$Al2O3 + arr$Na2O + arr$K2O + arr$CaO)) * 100, cluster = i, index = as.numeric(rownames(arr))) + CIA.libs <- rbind(CIA.libs, CIA) +} +names(CIA.libs) <- c("CIA", "cluster","index") + +ggplot(data = CIA.libs, aes(x = CIA)) + + geom_histogram() + + facet_wrap(~as.factor(cluster), scales = "free") + +#cluster 1 - 5 +# (1) unimodal peak around 45-50 = virtually no weathering. Higher values > 60 should be looked at more in depth +# (2) same as (1) but another smaller peak around 55. again no weathering, but higher values should be kept +# (3) virtually no weathering. I'm wondering if these LIBS samples are benign or what. They have high Ca values and about nothing else +# (4) Unimodal at 50 +# (5) Left skewed-unimodal around 50, with some higher values + +clusters.gt70 <-CIA.libs[which(CIA.libs$CIA >= 70),] +clusters.gt70.centers <- data.frame() +for (i in unique(clusters.gt70$cluster)) { #no cluster 3 due to its low values + arr <- clusters.gt70[which(clusters.gt70$cluster == i), ] + means <- colMeans(shared_libs[arr$index,1:8]) + clusters.gt70.centers <- rbind(clusters.gt70.centers, means) +} + +names(clusters.gt70.centers) <- names(shared_libs)[1:8] +gt.70.heatmap <- pheatmap(clusters.gt70.centers, scale = "none", main = "GT60") + +clusters.lt70 <- CIA.libs[which(CIA.libs$CIA < 70),] +clusters.lt70.centers <- data.frame() +for (i in unique(clusters.lt70$cluster)) { #no cluster 3 due to its low values + arr <- clusters.lt70[which(clusters.lt70$cluster == i), ] + means <- colMeans(shared_libs[arr$index,1:8]) + clusters.lt70.centers <- rbind(clusters.lt70.centers, means) +} + +names(clusters.lt70.centers) <- names(shared_libs)[1:8] +lt.70.heatmap <- pheatmap(clusters.lt70.centers, scale = "none", main = "LT60") + +grid.arrange(as.ggplot(gt.70.heatmap), as.ggplot(lt.70.heatmap), ncol = 2) + +#really just alumine is in higher quantities. This is an indicator of more potent weathering. + +libs_ternary <- shared_libs %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="LIBS", CIA = NA) %>% + select(c(13:17)) + +libs_ternary[clusters.gt70$index, 5] <- "CIA>70" +libs_ternary[clusters.lt70$index, 5] <- "CIA<70" + + +pixl_ternary <- shared_pixl %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="PIXL", CIA = "CIA<70") %>% + select(c(11:15)) + + + +ggtern(libs_ternary, ggtern::aes(x = x, y = y, z = z, color = CIA, shape = value)) + + geom_point() + + geom_point(data = pixl_ternary, aes(x = x, y = y, z = z, color = "PIXL", shape = value)) + + theme_rgbw() + + labs(x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K", + title = 'LIBS and PIXL ternary clustered by CIA') + +#All PIXL data points are < 70 + +ggtern(libs_ternary, ggtern::aes(x = x, y= y, z = z, color = as.factor(umap_clusters))) + + geom_point() + + theme_rgbw() + + labs(x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K", + title = "UMAP Clustered on Major Cations") + +``` + +### Discussion of results + +* There isn't a huge difference in weathering conditions on cation densities aside from differing Al2O3, but this is the main polyatomic ion in question when calculating CIA so it makes sense that it would be the lead indicator of weathering conditions purely based on the formula. However, while Ca, Na & K also were calculated it points towards the Fe + Mg densities on how different weathering might be affected by Fe + Mg concentrations. In all our heatmaps we saw that the relationship between Fe and Mg as either in unison or slightly different. One quick question would be, well, if our Alumine values differ what is the relationship between Fe and Mg (in terms of %), and does this matter? + +* Also, quick thought, but the points that are in the middle of the ternary plot who have relatively high concentrations of all cation combinations; what does that say about their mineralogy? + + +## Analysis: Question 3 (Provide short name) + +### Question being asked + +_Does the abundance of certain minerals provide any insight on ternary plots of major cations? _ + +### Data Preparation + +* In order to check the differences in mineral abundance, I will be utilizing the lithology dataframe to see if there are any clear cut differences in mineral abundance between data points, especially differing that of campaign type and rock type. + +* The steps for this analysis are very simple. Im going to find the sizes (i.e total sum of each sample row of minerals) of each sample to then compare on ternary graphs of major cations, under studied cations and then see if I can find some correlation between total wt% of samples to their abundance of minerals. Finding out that there isn't (as seen at the bottom) shows that something else is going on. + +_lithology.df, shared_libs, shared_pixl_ + +```{r, result03_data} +# Include all data processing code (if necessary), clearly commented + +``` + +### Analysis methods used + +* For this last analysis I just wanted to look at unusual elements compared to their relative abundance of total minerals. There is no direct reason for the way I split cation nodes, but I am aware that SO3 and Cl are assocaited with lake ecosystems and that Ti and Cr203 are very unreactive based on their wt%. + +* We don't see any outstanding results from this analysis but we can see that abundance is related to total wt%. Our r^2 with regular pearson coefficient is ~27% while Kendall's non-parametric test is ~20%. This is an extreme oversimplification and I believe that comparing total wt% to certain mineral types might actually prove to be substantial and might aid in prediction tasks later on. With regard to our ternary plots, while there aren't many outstanding results, we see that Roubion had the largest abundance of minerals (13 in total) but that sedimentary values (who gravitated towards SO3 & Cl concentrations) actually had the lowest abundance of total minerals. If we were to create some form of specialized metric that accounted for certain groups of minerals and then measured that to total wt% and/or cation groups this might prove interesting. + +```{r, result03_analysis} +#size of points for Lithology is dependent on the abundance of each mineral within the 16 samples ON a ternary plot. Hover over, can we see which + +#characterize size of points by their total abundance per sample at first. +sizes <- rowSums(lithology.matrix) + +ggtern(pixl_ternary, ggtern::aes(x = x, y= y, z = z)) + + geom_point(aes(size = sizes)) + + theme_rgbw() + + labs(x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K", + title = "Ternary Plot with major Cations, sized by mineral abundance") +#no real defined pattern but from our last plot and the point that tends inward is just another reason behind the idea that higher wt% likely have larger variety of minerals. +#im going to alter the Ca+Na+K just to see if there is an effect on other features + + + +#look at unusual elements +revised_pixl_ternary <- pixl.df %>% + mutate(x=(S03+Cl)/100,y=(P205+Mno)/100,z=(Ti02+Cr203 )/100) %>% + select(c(20:22)) %>% + mutate(sizes = sizes) + +ggtern(revised_pixl_ternary, ggtern::aes(x = x, y= y, z = z)) + + geom_point(aes(size = sizes, color = pixl.df$type)) + + theme_rgbw() + + labs(x="S03+Cl", + y="P205+MnO", + z="Ti02+Cr203", + title = "Ternary Plot of usually undocumented Cations") + + +hist(rowSums(pixl.df[,2:14])) +hist(sizes) +plot(x = rowSums(pixl.df[,2:14]), y = sizes) +cor.test(x = rowSums(pixl.df[,2:14]), y = sizes, method = 'kendall') #non-parametric cor.test + + +``` + + +### Discussion of results + +* Overall, our results show the abundance of distinct clusters especially that of high CaO concentration, MgO + FeO-T patterns , SiO2 patterns and overall slight differences in other metrics that might affect classification of LIBS data in particular with correspondence to sampled PIXL values. Cation abundance in clusters is a powerful classifier and can be an indicator of rock morphology. I think the clustering of the LIBS data points and plotting PIXL collected samples on top is an important first step of finding how we can connect more data to classify Mars surface. LIBS is the most important and most detail heavy as it gives us insight into abnormal values as well as where on the surface does minerality change. From now forward, clustering of LIBS seems arbitrary and redundant and instead I think focus onto how Lithology connects to PIXL and how ultimately the oxide concentrations of LIBS can explain what minerals might be present. In addition comparing elemental chemistry with the chemical formulas of minerals will also prove useful as we can see not only the differences in all types of mineral formulas and how they differ. This might provide insight into how the rocks studied ended up how they are as well as a proxy for potential life? This is not as realistic but as the rover moves to different spots we might see clear cut results that are indicative of life. Finally, and as to not be redundant, but finding direct correlations between certain compounds and the abundance, and thus variance, of minerals within each sample might give us insight into how we can predict what exactly LIBS is analyzing. + + + +## Summary and next steps + +* The main next step I think is to find these direct correlations and really dive into the literature on how rock types differ among minerality and how elemental composition can differ. In addition does this vary per campaign type, depth (z) within the crater and the landscape (geomprhology) that surround the rover's course? Personally, I think if there is any life it is underground. The surface of Mars is far too cold to support life now, and its atmosphere is far too counter intuitive to life. Taking a deep look at the poles might be interesting as well as there is pre-existing knowledge (https://marsed.asu.edu/mep/ice/polar-caps) that they might contain water vapor in the atmosphere solidied with carbon dioxide. That being said given the rovers RIMFAX frequency sensor that captures underground stratifications I wonder if life could survive under the ground (Wierzchos et al. 2012)? Radiation is detrimental to organisms under UVC and UVB wavelengths since the ozone of Mars is not developed; thus, life might be able to thrive in endolithic colonies wihtin the fissures of the underground. + +* Finally, creation of a metric that can help us understand SHERLOC data a little better would be very beneficial. A prototype metric will compare the minerals chemical formula to the concentration of minerals (or just oxides) from PIXL and LIBS datasets. It will find the molecular weight of each element or compound times (*) however many atoms/ions are present than compare that mass to the total mass of the chemical formula. ([mass / total mass] x 100). This value will be compared to the weight % of the PIXL feature. Summing all these up for every compound/element and finding the differnce from 1 will be a good indicator of how much of that mineral is explained by our data. That being said it will be difficult to account for oxides and individual elements, as well as for 'other' compounds. + +# 1 - Riemann sum ( n = 1, last element) |mineral element/compound% - PIXL element/compound%| #these are not division signs just slashes + + + + + + + + + +## Notes + +- see which lat and long correspond to similar values of our samples. Maybe we will see more. Apply clustering to data. Using AP clustering find clusters and see the overlap of prior PCA plots (Specifically campaign) and heatmaps to analyze concentration of clusters + +- SHERLOC has been used recently at Cheyava falls. What signs in both our samples, and in our LIBS data show that chemical reactions possible for life have occured? +#https://www.nasa.gov/missions/mars-2020-perseverance/perseverance-rover/nasas-perseverance-rover-scientists-find-intriguing-mars-rock/ + +- I think retrieving the chemical formulas for pure types of minerals and comparing the weight % of all different elemental features and seeing to what amount are these minerals different than their pure countertype. +- Iron and P were found on these spotchy rocks at Cheyava Falls. Can we track the amount of iron and P, or chemically stimulating mutual elements that will also provide evidence of life? + +- what differentiates these between abiotic and biotic (coming from animals, decomposed matter?) + +- what is the importance of Sulfur trioxide on river beds? + +- silicon dioxide? + +- why are Ferric(is) oxides and Magnesium oxide compatible? What about their relationship exposes their variance in the same direction? + +- clusters both exhibit small proportions of Na2O, TiO2, K2O, Al2O3 & CaO +- SiO2, ferric and magnesium oxide are of interest. SiO2 always seems to have a specific cluster with a very high amount of SiO2 + +- can we get access to RIMFAX data? (Wierzchos et al. 2012) +- fissures might show promise of life. endolithic colonies usually go underground to escape solar radiation (however, lots of UVB and UVC) diff --git a/StudentNotebooks/Assignment03/walczd3_assignment03.html b/StudentNotebooks/Assignment03/walczd3_assignment03.html new file mode 100644 index 0000000..96fb678 --- /dev/null +++ b/StudentNotebooks/Assignment03/walczd3_assignment03.html @@ -0,0 +1,1464 @@ + + + + + + + + + + + + + + + +DAR F24 Assignment 3 Notebook Template + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+ +
+

BiWeekly Work Summary

+

NOTE: Follow an outline format; use bullets to +express individual points.

+ +
+
+

Personal Contribution

+ +

PACKAGES

+
# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!!
+# This section  install packages if they are not already installed. 
+# This block will not be shown in the knit file.
+knitr::opts_chunk$set(echo = TRUE)
+
+# Set the default CRAN repository
+local({r <- getOption("repos")
+       r["CRAN"] <- "http://cran.r-project.org" 
+       options(repos=r)
+})
+
+if (!require("pandoc")) {
+  install.packages("pandoc")
+  library(pandoc)
+}
+
## Loading required package: pandoc
+
if (!require("ggplotify")) {
+  install.packages("ggplotify")
+  library(ggplotify)
+}
+
## Loading required package: ggplotify
+
if (!require("car")) {
+  install.packages("car")
+  library(car)
+}
+
## Loading required package: car
+
## Loading required package: carData
+
# Required packages for M20 LIBS analysis
+if (!require("rmarkdown")) {
+  install.packages("rmarkdown")
+  library(rmarkdown)
+}
+
## Loading required package: rmarkdown
+
## 
+## Attaching package: 'rmarkdown'
+
## The following objects are masked from 'package:pandoc':
+## 
+##     pandoc_available, pandoc_convert, pandoc_version
+
if (!require("tidyverse")) {
+  install.packages("tidyverse")
+  library(tidyverse)
+}
+
## Loading required package: tidyverse
+
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
+## ✔ dplyr     1.1.4     ✔ readr     2.1.5
+## ✔ forcats   1.0.0     ✔ stringr   1.5.1
+## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
+## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
+## ✔ purrr     1.0.2     
+## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
+## ✖ dplyr::filter() masks stats::filter()
+## ✖ dplyr::lag()    masks stats::lag()
+## ✖ dplyr::recode() masks car::recode()
+## ✖ purrr::some()   masks car::some()
+## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
+
if (!require("stringr")) {
+  install.packages("stringr")
+  library(stringr)
+}
+
+if (!require("ggbiplot")) {
+  install.packages("ggbiplot")
+  library(ggbiplot)
+}
+
## Loading required package: ggbiplot
+
if (!require("pheatmap")) {
+  install.packages("pheatmap")
+  library(pheatmap)
+}
+
## Loading required package: pheatmap
+
if (!require("apcluster")) {
+  install.packages("apcluster")
+  library(apcluster)
+}
+
## Loading required package: apcluster
+## 
+## Attaching package: 'apcluster'
+## 
+## The following object is masked from 'package:stats':
+## 
+##     heatmap
+
if (!require("vegan")) {
+  install.packages("vegan")
+  library(vegan)
+}
+
## Loading required package: vegan
+## Loading required package: permute
+## Loading required package: lattice
+## This is vegan 2.6-8
+
if (!require("ape")) {
+  install.packages("ape")
+  library(ape)
+}
+
## Loading required package: ape
+## 
+## Attaching package: 'ape'
+## 
+## The following object is masked from 'package:dplyr':
+## 
+##     where
+
if (!require("Matrix")) {
+  install.packages("Matrix")
+  library(Matrix)
+}
+
## Loading required package: Matrix
+## 
+## Attaching package: 'Matrix'
+## 
+## The following objects are masked from 'package:tidyr':
+## 
+##     expand, pack, unpack
+
if (!require("gridExtra")) {
+  install.packages("gridExtra")
+  library(gridExtra)
+}
+
## Loading required package: gridExtra
+## 
+## Attaching package: 'gridExtra'
+## 
+## The following object is masked from 'package:dplyr':
+## 
+##     combine
+
if (!require("umap")) {
+  install.packages("umap")
+  library(umap)
+}
+
## Loading required package: umap
+
if (!require("ggtern")) {
+  install.packages("ggtern")
+  library(ggtern)
+}
+
## Loading required package: ggtern
+## Registered S3 methods overwritten by 'ggtern':
+##   method           from   
+##   grid.draw.ggplot ggplot2
+##   plot.ggplot      ggplot2
+##   print.ggplot     ggplot2
+## --
+## Remember to cite, run citation(package = 'ggtern') for further info.
+## --
+## 
+## Attaching package: 'ggtern'
+## 
+## The following objects are masked from 'package:gridExtra':
+## 
+##     arrangeGrob, grid.arrange
+## 
+## The following objects are masked from 'package:ggplot2':
+## 
+##     aes, annotate, ggplot, ggplot_build, ggplot_gtable, ggplotGrob,
+##     ggsave, layer_data, theme_bw, theme_classic, theme_dark,
+##     theme_gray, theme_light, theme_linedraw, theme_minimal, theme_void
+

LOAD IN DATA

+
#-------------LIBS-------------------
+# Load the saved LIBS data with locations added
+libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
+libs.std_dev <- libs.df %>%
+    select((c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+libs.df <- libs.df %>%  
+  select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+
+# Convert the points to numeric
+libs.df$point <- as.numeric(libs.df$point)
+
+# Review what we have
+summary(libs.df)
+
##       sol              lat             lon           target         
+##  Min.   :  15.0   Min.   :18.43   Min.   :77.34   Length:1932       
+##  1st Qu.: 281.0   1st Qu.:18.44   1st Qu.:77.36   Class :character  
+##  Median : 557.0   Median :18.46   Median :77.40   Mode  :character  
+##  Mean   : 565.1   Mean   :18.46   Mean   :77.40                     
+##  3rd Qu.: 872.0   3rd Qu.:18.48   3rd Qu.:77.44                     
+##  Max.   :1019.0   Max.   :18.50   Max.   :77.45                     
+##      point             SiO2            TiO2            Al2O3       
+##  Min.   : 1.000   Min.   : 0.00   Min.   :0.0000   Min.   : 0.000  
+##  1st Qu.: 3.000   1st Qu.:42.04   1st Qu.:0.0300   1st Qu.: 3.080  
+##  Median : 5.000   Median :45.80   Median :0.3200   Median : 4.925  
+##  Mean   : 5.776   Mean   :43.47   Mean   :0.3719   Mean   : 6.246  
+##  3rd Qu.: 8.000   3rd Qu.:49.23   3rd Qu.:0.6400   3rd Qu.: 8.533  
+##  Max.   :28.000   Max.   :76.12   Max.   :2.4000   Max.   :38.350  
+##       FeOT            MgO             CaO              Na2O       
+##  Min.   : 0.29   Min.   : 0.29   Min.   : 0.080   Min.   :0.0000  
+##  1st Qu.:13.27   1st Qu.: 5.72   1st Qu.: 1.830   1st Qu.:0.9775  
+##  Median :20.21   Median :12.78   Median : 3.625   Median :1.5200  
+##  Mean   :20.07   Mean   :16.47   Mean   : 4.726   Mean   :1.7600  
+##  3rd Qu.:25.45   3rd Qu.:27.83   3rd Qu.: 4.622   3rd Qu.:2.4000  
+##  Max.   :82.68   Max.   :45.21   Max.   :52.130   Max.   :7.5200  
+##       K2O         
+##  Min.   : 0.0000  
+##  1st Qu.: 0.0000  
+##  Median : 0.3000  
+##  Mean   : 0.5909  
+##  3rd Qu.: 0.7800  
+##  Max.   :34.8700
+
#----------PIXL----------------------
+# Load the saved PIXL data with locations added
+pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds")
+
+pixl.df
+
## # A tibble: 16 × 19
+##    sample  Na20   Mgo Al203  Si02  P205   S03    Cl   K20   Cao  Ti02 Cr203
+##     <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+##  1      1  5.55  2.64  7.56  38.3  1.65  2.69  3.4   0.75  7.77  1.47  0.03
+##  2      2  4.67  2.21  6.97  43.8  2.76  3.21  1.48  1.06  7.62  2.49  0.01
+##  3      3  1.93 19.2   2.42  39.4  0.48  0.78  0.66  0.18  2.94  0.37  0.26
+##  4      4  1.87 12.8   2.36  40.3  0.28  1.66  0.94  0.2   2.94  0.99  0.29
+##  5      5  4.5   0.73 11.6   57.1  0.84  1     2.08  1.9   4.31  0.59  0   
+##  6      6  1.87 12.8   2.36  40.3  0.28  1.66  0.94  0.2   2.94  0.99  0.29
+##  7      7  1.87 12.8   2.36  40.3  0.28  1.66  0.94  0.2   2.94  0.99  0.29
+##  8      8  4.5   0.73 11.6   57.1  0.84  1     2.08  1.9   4.31  0.59  0   
+##  9      9  4.5   0.73 11.6   57.1  0.84  1     2.08  1.9   4.31  0.59  0   
+## 10     10  1.8  22.7   1.7   22.6  0.1   2.6   4.5   0.3   1.8   0.2   0.2 
+## 11     11  1.8  22.7   1.7   22.6  0.1   2.6   4.5   0.3   1.8   0.2   0.2 
+## 12     12  1.9  13.1   5     32.5  0.6  20     0.4   0.1   1.5   0.8   0.1 
+## 13     13  1.9  13.1   5     32.5  0.6  20     0.4   0.1   1.5   0.8   0.1 
+## 14     14  1    19.1   1.8   30.8  0.1   3.8   2     0     3.3   0.7   1.9 
+## 15     15  1    19.1   1.8   30.8  0.1   3.8   2     0     3.3   0.7   1.9 
+## 16     16  2.1  12.4   5.32  31.4  0.57 21.5   1.14  0.19  5.72  0.64  0.11
+## # ℹ 7 more variables: Mno <dbl>, `FeO-T` <dbl>, name <chr>, type <chr>,
+## #   campaign <chr>, location <chr>, abrasion <chr>
+
# Convert to factors
+pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], 
+                                       as.factor)
+
+# Review our dataframe
+summary(pixl.df)
+
##      sample           Na20            Mgo             Al203       
+##  Min.   : 1.00   Min.   :1.000   Min.   : 0.730   Min.   : 1.700  
+##  1st Qu.: 4.75   1st Qu.:1.853   1st Qu.: 2.533   1st Qu.: 2.220  
+##  Median : 8.50   Median :1.900   Median :12.800   Median : 3.710  
+##  Mean   : 8.50   Mean   :2.672   Mean   :11.682   Mean   : 5.072  
+##  3rd Qu.:12.25   3rd Qu.:4.500   3rd Qu.:19.100   3rd Qu.: 7.117  
+##  Max.   :16.00   Max.   :5.550   Max.   :22.700   Max.   :11.600  
+##                                                                   
+##       Si02            P205             S03               Cl       
+##  Min.   :22.60   Min.   :0.1000   Min.   : 0.780   Min.   :0.400  
+##  1st Qu.:31.22   1st Qu.:0.2350   1st Qu.: 1.495   1st Qu.:0.940  
+##  Median :38.85   Median :0.5250   Median : 2.600   Median :1.740  
+##  Mean   :38.55   Mean   :0.6512   Mean   : 5.562   Mean   :1.846  
+##  3rd Qu.:41.17   3rd Qu.:0.8400   3rd Qu.: 3.800   3rd Qu.:2.080  
+##  Max.   :57.10   Max.   :2.7600   Max.   :21.530   Max.   :4.500  
+##                                                                   
+##       K20              Cao             Ti02            Cr203      
+##  Min.   :0.0000   Min.   :1.500   Min.   :0.2000   Min.   :0.000  
+##  1st Qu.:0.1600   1st Qu.:2.655   1st Qu.:0.5900   1st Qu.:0.025  
+##  Median :0.2000   Median :3.120   Median :0.7000   Median :0.155  
+##  Mean   :0.5800   Mean   :3.688   Mean   :0.8194   Mean   :0.355  
+##  3rd Qu.:0.8275   3rd Qu.:4.310   3rd Qu.:0.9900   3rd Qu.:0.290  
+##  Max.   :1.9000   Max.   :7.770   Max.   :2.4900   Max.   :1.900  
+##                                                                   
+##       Mno             FeO-T               name             type  
+##  Min.   :0.1000   Min.   :13.24   Atsah     : 1   Igneous    :8  
+##  1st Qu.:0.2800   1st Qu.:16.71   Bearwallow: 1   N/A        :1  
+##  Median :0.4000   Median :23.86   Coulettes : 1   Sedimentary:7  
+##  Mean   :0.3812   Mean   :21.45   Hahonih   : 1                  
+##  3rd Qu.:0.4900   3rd Qu.:25.70   Hazeltop  : 1                  
+##  Max.   :0.6900   Max.   :30.05   Kukaklek  : 1                  
+##                                   (Other)   :10                  
+##          campaign    location          abrasion
+##  Crater Floor:9   01     : 1   Alfalfa     :2  
+##  Delta Front :7   02     : 1   Bellegrade  :2  
+##                   03     : 1   Berry Hollow:2  
+##                   04     : 1   Dourbes     :2  
+##                   05     : 1   Novarupta   :2  
+##                   06     : 1   Quartier    :2  
+##                   (Other):10   (Other)     :4
+
#----------SHERLOC----------------------
+# Read in data as provided.  
+sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds")
+
+# Clean up data types
+sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral)
+sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], 
+                                       as.numeric)
+# Transform NA's to 0
+sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0)
+
+# Reformat data so that rows are "abrasions" and columns list the presence of minerals. 
+# Do this by "pivoting" to a long format, and then back to the desired wide format.  
+
+sherloc_long <- sherloc_abrasion_raw %>%
+  pivot_longer(!Mineral, names_to = "Name", values_to = "Presence")
+
+# Make abrasion a factor 
+sherloc_long$Name <- as.factor(sherloc_long$Name)
+
+# Make it a matrix
+sherloc.matrix <- sherloc_long %>%
+  pivot_wider(names_from = Mineral, values_from = Presence)
+
+# Get sample information from PIXL and add to measurements -- assumes order is the same
+
+sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix)
+
+# Review what we have
+summary(sherloc.df)
+
##      sample               type           campaign         abrasion
+##  Min.   : 1.00   Igneous    :8   Crater Floor:9   Alfalfa     :2  
+##  1st Qu.: 4.75   N/A        :1   Delta Front :7   Bellegrade  :2  
+##  Median : 8.50   Sedimentary:7                    Berry Hollow:2  
+##  Mean   : 8.50                                    Dourbes     :2  
+##  3rd Qu.:12.25                                    Novarupta   :2  
+##  Max.   :16.00                                    Quartier    :2  
+##                                                   (Other)     :4  
+##          Name     Plagioclase        Sulfate         Ca-sulfate    
+##  Atsah     : 1   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
+##  Bearwallow: 1   1st Qu.:0.0000   1st Qu.:0.1875   1st Qu.:0.0000  
+##  Coulettes : 1   Median :0.0000   Median :1.0000   Median :0.0000  
+##  Hahonih   : 1   Mean   :0.1875   Mean   :0.6562   Mean   :0.3438  
+##  Hazeltop  : 1   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
+##  Kukaklek  : 1   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
+##  (Other)   :10                                                     
+##  Hydrated Ca-sulfate   Mg-sulfate     Hydrated Sulfates Hydrated Mg-Fe sulfate
+##  Min.   :0.000       Min.   :0.0000   Min.   :0.000     Min.   :0.0000        
+##  1st Qu.:0.000       1st Qu.:0.0000   1st Qu.:0.000     1st Qu.:0.0000        
+##  Median :0.000       Median :0.0000   Median :0.000     Median :0.0000        
+##  Mean   :0.125       Mean   :0.1875   Mean   :0.125     Mean   :0.1875        
+##  3rd Qu.:0.000       3rd Qu.:0.0000   3rd Qu.:0.000     3rd Qu.:0.0000        
+##  Max.   :1.000       Max.   :1.0000   Max.   :1.000     Max.   :1.0000        
+##                                                                               
+##   Perchlorates    Na-perchlorate    Amorphous Silicate   Phosphate     
+##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000     Min.   :0.0000  
+##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000     1st Qu.:0.0000  
+##  Median :0.0000   Median :0.00000   Median :0.0000     Median :0.0000  
+##  Mean   :0.0625   Mean   :0.03125   Mean   :0.1406     Mean   :0.2031  
+##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.2500     3rd Qu.:0.3125  
+##  Max.   :1.0000   Max.   :0.50000   Max.   :0.5000     Max.   :1.0000  
+##                                                                        
+##     Pyroxene         Olivine         Carbonate      Fe-Mg carbonate
+##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
+##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.4375   1st Qu.:0.000  
+##  Median :1.0000   Median :0.6250   Median :1.0000   Median :0.000  
+##  Mean   :0.6875   Mean   :0.5312   Mean   :0.7344   Mean   :0.125  
+##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.000  
+##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
+##                                                                    
+##  Hydrated Carbonates Disordered Silicates    Feldspar         Quartz       
+##  Min.   :0           Min.   :0.000        Min.   :0.000   Min.   :0.00000  
+##  1st Qu.:0           1st Qu.:0.000        1st Qu.:0.000   1st Qu.:0.00000  
+##  Median :0           Median :0.000        Median :0.000   Median :0.00000  
+##  Mean   :0           Mean   :0.125        Mean   :0.125   Mean   :0.03125  
+##  3rd Qu.:0           3rd Qu.:0.000        3rd Qu.:0.000   3rd Qu.:0.00000  
+##  Max.   :0           Max.   :1.000        Max.   :1.000   Max.   :0.25000  
+##                                                                            
+##     Apatite        FeTi oxides         Halite          Iron oxide    
+##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
+##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
+##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.0000  
+##  Mean   :0.1406   Mean   :0.1406   Mean   :0.04688   Mean   :0.2812  
+##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.5000  
+##  Max.   :1.0000   Max.   :1.0000   Max.   :0.25000   Max.   :1.0000  
+##                                                                      
+##  Hydrated Iron oxide Organic matter   Sulfate+Organic matter
+##  Min.   :0.00000     Min.   :0.0000   Min.   :0.0000        
+##  1st Qu.:0.00000     1st Qu.:0.0000   1st Qu.:0.0000        
+##  Median :0.00000     Median :1.0000   Median :0.0000        
+##  Mean   :0.01562     Mean   :0.5938   Mean   :0.2188        
+##  3rd Qu.:0.00000     3rd Qu.:1.0000   3rd Qu.:0.2500        
+##  Max.   :0.25000     Max.   :1.0000   Max.   :1.0000        
+##                                                             
+##  Other hydrated phases Phyllosilicates      Chlorite     
+##  Min.   :0.0000        Min.   :0.00000   Min.   :0.0000  
+##  1st Qu.:0.0000        1st Qu.:0.00000   1st Qu.:0.0000  
+##  Median :0.2500        Median :0.00000   Median :0.0000  
+##  Mean   :0.4375        Mean   :0.09375   Mean   :0.0625  
+##  3rd Qu.:1.0000        3rd Qu.:0.06250   3rd Qu.:0.0000  
+##  Max.   :1.0000        Max.   :0.50000   Max.   :0.5000  
+##                                                          
+##  Kaolinite (hydrous Al-clay)    Chromite        Ilmenite     Zircon/Baddeleyite
+##  Min.   :0.0000              Min.   :0.000   Min.   :0.000   Min.   :0.000     
+##  1st Qu.:0.0000              1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000     
+##  Median :0.0000              Median :0.000   Median :0.000   Median :0.000     
+##  Mean   :0.1875              Mean   :0.125   Mean   :0.125   Mean   :0.125     
+##  3rd Qu.:0.0000              3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:0.000     
+##  Max.   :1.0000              Max.   :1.000   Max.   :1.000   Max.   :1.000     
+##                                                                                
+##  Fe-Mg-clay minerals    Spinels      
+##  Min.   :0.0000      Min.   :0.0000  
+##  1st Qu.:0.0000      1st Qu.:0.0000  
+##  Median :0.0000      Median :0.0000  
+##  Mean   :0.1875      Mean   :0.0625  
+##  3rd Qu.:0.0000      3rd Qu.:0.0000  
+##  Max.   :1.0000      Max.   :0.5000  
+## 
+
# Load the saved lithology data with locations added
+lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds")
+
+# Cast samples as numbers
+lithology.df$sample <- as.numeric(lithology.df$sample)
+
+# Convert rest into factors
+lithology.df[sapply(lithology.df, is.character)] <-
+  lapply(lithology.df[sapply(lithology.df, is.character)], 
+                                       as.factor)
+
+# Keep only first 16 samples because the data for the rest of the samples is not available yet
+lithology.df<-lithology.df[1:16,]
+
+# Create a matrix containing only the numeric measurements.  The remaining features are metadata about the sample. 
+lithology.matrix <- sapply(lithology.df[,6:40],as.numeric)-1            
+
+# Review the structure of our matrix
+str(lithology.matrix)
+
##  num [1:16, 1:35] 0 0 0 0 0 0 0 1 1 0 ...
+##  - attr(*, "dimnames")=List of 2
+##   ..$ : NULL
+##   ..$ : chr [1:35] "feldspar" "plagioclase" "pyroxene" "olivine" ...
+
+
+

Analysis: Question 1 (Provide short name)

+
+

Question being asked

+

Are there any similarities or unusual trends in our LIBS data +that we can compare to rover collected PIXL samples?

+
+
+

Data Preparation

+
    +
  • For analysis #1 I will be analyzing the differences in LIBS and +PIXL data exclusively. LIBS data contains only 8 numerical features, all +of which are contained in the PIXL data. 4 features are unique to the +PIXL samples; P2O5, Cl, Cr203, and MnO. Therefore, aside from metadata +like campaign (which will be included later) or rock type, I am +preparing the data to analyze shared features.

  • +
  • The first step will be applying a variety of clustering +algorithms including Affinity Propagation (AP), k-means, & Uniform +Manifold Approximation and Projection (UMAP). I apply these because I’m +interested to see if under different supervised and unsupervised +clustering methods, do similar features or clusters arise. After +clustering, to see where shared oxide features are most correlated I’ll +plot a PCA with the most prominent clustering technique on LIBS data and +compare the eigenvectors to that of the PIXL PCA.

  • +
  • LIBS & PIXL datasets exclusively for this question.

  • +
+
+
+

Analysis: Methods and results

+

Applying clustering algorithms to detect distinct +clusters.

+
#https://books.google.com/books?hl=en&lr=&id=spQ7FWsRX30C&oi=fnd&pg=PA3&dq=sedimentary+rocks&ots=T0fThFnYqm&sig=GbZZW_JuHjm9VmebYKaP1IRzSD8#v=onepage&q=sedimentary%20rocks&f=false
+#https://www.osti.gov/servlets/purl/1409785 - LIBS
+# - LIBS data is data not directly sampled by the Rover and its abrasion tool. Instead it is found by a projected laser from the SUPERCAM instrument that points at a specific rock and is able to distinguish the specified Polyatomic ions by wavelength intensity. 
+
+
+#PIXL samples lat and lon
+samples <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples.Rds")
+sample_coord <- samples[which(samples$name %in% pixl.df$name),c(3:5)]
+
+shared_pixl <- pixl.df[,c(5, 11, 4, 14, 3, 10, 2, 9, 15, 17)]
+names(shared_pixl)[1:8] <- names(libs.df[,6:13])
+shared_libs <- cbind(libs.df[,6:13], lat = libs.df[,2], lon = libs.df[,3], point = libs.df[,5]) #features, lat, long, point(?) 
+
+#AP Clustering on shared data 
+set.seed(4)
+get_ap <- function( data) {
+  ap <- apcluster(negDistMat(r = 2), data, q = 0.001)
+  clusters <- ap@clusters
+  ap_clusters <- c(1:16)
+  for (i in seq(length(clusters))) {
+    num <- i
+    for (j in seq(length(clusters[[i]]))) {
+      ap_clusters[clusters[[i]][[j]]] = num
+    }
+  }
+  return (ap_clusters)
+}
+
+ap_clusters.pixl <- get_ap(shared_pixl[,1:8])
+ap_clusters.libs <- get_ap(shared_libs[,1:8])
+
+unique(ap_clusters.pixl) #k = 3
+
## [1] 3 1 2
+
unique(ap_clusters.libs) #k = 13
+
##  [1]  7  1  2  3 12 13  4  9  8 10  5  6 11
+
#Find k-means cluster
+wssplot <- function(data, df, nc=8, seed=4){
+  wss <- data.frame(cluster=1:nc, quality=c(0))
+  for (i in 1:nc){
+    set.seed(seed)
+    wss[i,2] <- kmeans(data, centers=i)$tot.withinss
+  }
+  ggplot(data=wss,aes(x=cluster,y=quality)) +
+    geom_line() +
+    ggtitle(paste("Quality of k-means by Cluster -", df))
+}
+
+wssplot(shared_pixl[,1:8], "PIXL") #k = 3,4
+

+
wssplot(shared_libs[,1:8], "LIBS") #k = 3,4,6
+

+
umapplot <- function(data, i, df) {
+  custom.config <- umap.defaults
+  custom.config$n_neighbors = i
+  UMAP <- umap(data, config = custom.config)
+  plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = paste("UMAP[",df,"] on nn = ",i, sep = ""))
+}
+
+#Apply UMAP
+#find optimal kNN
+#PIXL, with all features under UMAP showed unrecognizable features. Shared features showed 2,4,6 possible clusters but did not really converge as nn rose
+
+#LIBS
+#nn <- seq(5,25, 5)
+#for (i in nn) {
+#  umapplot(shared_libs[,1:8],i, "LIBS")
+#}
+#nn = 25 for just LIBS data found 4 distinct clusters. One of the clusters right off the bat showed some form of seperation at lower n_neighbors parameters but evidently started to come together as nn increased
+custom.config <- umap.defaults 
+custom.config$n_neighbors = 25 
+UMAP <- umap(shared_libs[,1:8], config = custom.config )
+UMAP.data <- UMAP$layout
+plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = "UMAP on nn = 25")
+abline( a =0, b = -3, h = 0,col = "blue")
+

+
#use line: y = -3x to find which data points are to the left and right of the cluster line. 
+# y = -3x --> x = y / -3
+x <- UMAP.data[,2] / -3
+plot(x = x, y = UMAP.data[,2])
+

+
umap_clusters <- rep(NA,nrow(UMAP.data)) 
+umap_clusters[which( (UMAP.data[,1] >= 2) & (UMAP.data[,2] > 0) )] = 1
+umap_clusters[which( (UMAP.data[,1] < 2) & (UMAP.data[,2] > 0) )] = 2
+umap_clusters[which( UMAP.data[,1] > 9)] = 3 #unique cluster isolated
+umap_clusters[which( (UMAP.data[,1] < x) & (UMAP.data[,2] < 0)) ] = 4
+umap_clusters[ which( ( (UMAP.data[,1] > x ) & (UMAP.data[,1] < 9) ) & (UMAP.data[,2] < 0))] = 5
+ggplot(data.frame(UMAP.data), aes(x = X1, y = X2, color = as.factor(umap_clusters))) + 
+  geom_point() + 
+  labs(title = "UMAP cluster check: nn = 15")
+

+
#run PCA with umap clustering on libs data + campaign type for pixl of shared features 
+pca.libs <- prcomp(shared_libs[,1:8], retx = T, center = T)
+umap.biplot <- ggbiplot::ggbiplot(pca.libs, 
+         groups = as.factor(umap_clusters),
+         circle = T, 
+         varname.size = 2, 
+         varname.color = "red", labels.size =8
+         ) + theme_bw() + labs(title = "PCA with UMAP clusters") 
+#total PIXL
+pca.pixl <- prcomp(pixl.df[,2:14], retx = T, center = T)
+pixl.biplot <- ggbiplot::ggbiplot(pca.pixl, 
+         groups = pixl.df$campaign,
+         circle = T, 
+         varname.size = 2, 
+         varname.color = "red", labels.size =8
+         ) + theme_bw() + labs(title = "PCA with UMAP clusters") 
+
+
+
+
+#plot heatmaps
+umap.centers <- data.frame()
+unq <- unique(umap_clusters)
+for (i in unq) {
+  arr <- colMeans(shared_libs[which(umap_clusters == i),1:8])
+  umap.centers <- rbind(umap.centers, arr)
+}
+names(umap.centers) <- colnames(shared_libs[,1:8])
+
+pixl.centers <- data.frame()
+for (i in unique(shared_pixl$campaign)) {
+  arr <- colMeans(pixl.df[which(shared_pixl$campaign == i), 2:14])
+  pixl.centers <- rbind(pixl.centers, arr)
+}
+names(pixl.centers) <- names(pixl.df)[2:14]
+
+pixl.centers
+
##       Na20       Mgo    Al203     Si02      P205       S03       Cl       K20
+## 1 3.473333  7.186667 6.536667 45.96667 0.9166667  1.628889 1.622222 0.9211111
+## 2 1.642857 17.462857 3.188571 29.02286 0.3100000 10.618571 2.134286 0.1414286
+##        Cao      Ti02     Cr203       Mno FeO-T
+## 1 4.453333 1.0077778 0.1300000 0.4633333 20.98
+## 2 2.702857 0.5771429 0.6442857 0.2757143 22.05
+
u.heatmap <- pheatmap(umap.centers, scale = "none", main = "UMAP heatmap")
+

+
pxl.heatmap <- pheatmap(pixl.centers, scale = "none", main = "PIXL-Campaign heatmap", labels_row =  unique(shared_pixl$campaign))
+

+
grid.arrange(pixl.biplot, umap.biplot, ncol = 2)
+

+
grid.arrange(as.ggplot(u.heatmap), as.ggplot(pxl.heatmap), ncol = 2)
+

+
#look at the pattern differences between high SiO2 concentrations with FeO-T + MgO & Al2O3 + CaO
+
+
+

Discussion of results

+
    +
  • Campaign type is a powerful indicator of the type of chemical makeup +that a sample will have. LIBS data is kind of erratic and while we are +getting information that can help classify how each cluster identified +by UMAP is related to campaign type, it is hard to say whether or not +this information is reliable. In the next analysis I will compare +SHERLOC data to my found clusters, especially that of delta front +campaigns and the unusually high CaO cluster found.
  • +
+
+
+
+

Analysis: Question 2 (Provide short name)

+
+

Question being asked

+

According to papers in 1985 and 2010 regarding amelioration +(glacial to inter glacial stages) of lake basins and analysis of river +basins and its trapped sediments, respectively, an index called the +chemical index of alteration (CIA) was used to measure the level of +weathering that rocks underwent as a result of chemical reactions (i.e +reactions with the water and other dissolved substances). Can we use +this index to get accurate information about how silicate-rocks +underwent some lasting form of chemical weathering that is indicative of +the last stages of water on Mars and its effect on basin and delta front +rocks?

+
+
+

Data Preparation

+
    +
  • For analysis #2 I will be utilizing the lithography and PIXL +dataset to calculate CAI for our samples considering we know the +definite campaign and sampling location they were at.

  • +
  • The immediate code below is just another way of plotting the +concentration (in this case density) of all our features on different +clusters. This is arbitrary and was just used as another way of +plotting. In my analysis I first plotted all lithology points to find +which of the silicates specified in Dr. Roger’s lecture 2 were present. +I copied these column names so that I could find the relative abundance +of silicates in each sample however, this was just a check for PIXL data +and as of now has no direct meaning other than associating > 0 or +< 0, meaning that silicates with CaO are available. Regardless, for +PIXL and LIBS I found the distributions of CIA values for all samples +and clusters, respectively. I then looked at the difference between +genuine weathering criteria (CIA > 70 indicates some form of +weathering occurred; there are specific ranges that differ in the +literature so I just wanted to see as a whole whether samples > or +< 70 differed). Finally, I plotted a ternary plot of all the values +greater than 70 with the respective cation axis’.

  • +
  • I will be re-using shared_libs and shared_pixl. To find silicate +containing rocks I will be using the lithology.df

  • +
+
#for each cluster facet_wrap the features but color the cluster in
+shared_libs$cluster <- umap_clusters
+shared_libs.long <- shared_libs %>%
+  pivot_longer(cols = names(shared_libs)[1:8], names_to = "Variable", values_to = "Values")
+ggplot(shared_libs.long, aes(x = Values, color = as.factor(cluster))) + 
+  geom_density() + 
+  facet_wrap(~Variable, scales = "free") + 
+  labs(title = "Density plots of all variables")
+

+
#lat long map
+sample_coord$name <- "PIXL"
+coords <- rbind(data.frame(name = rep("LIBS", nrow(libs.df)), lat = libs.df$lat, lon = libs.df$lon), sample_coord)
+ggplot(data = coords, aes(x = lat, y = lon, color = name)) + 
+  geom_point()
+

+
+
+

Analysis: Methods and Results

+
    +
  • I want to see if their is any clear form of chemical weathering +between UMAP identified clusters.
  • +
+
#Lithology heatmap - so note CaO references only the CaO that is only avaliable in silicate rocks. For this since, we aren't exactly sure of the chemical composition of SHERLOC or lithology data, we just check to see if the samples contain silicates (rowSums[which(silicates)] > 0)
+#silicates are found in accordance
+pheatmap(lithology.matrix, scale = "none", main = "Lithology heatmap", labels_row = lithology.df$campaign)
+

+
delta.minerals <- c( "Kaolinite", "Hydrated_Mg_Fe_Sulfate", "Fe_Mg_clay", "Mg_sulfate", "Spinels", "Zircon/Baddeleyite", "Chromite", "Ilmenite", "Fe_Mg_carbonate") #Isolated delta minerals; notice the abundance of Mg_Fe minerals
+silicates <- c("quartz", "feldspar", "pyroxene", "olivine", "Fe_Mg_clay") 
+rowSums(lithology.matrix[,silicates]) #all samples satisfied
+
##  [1] 1 1 1 2 2 2 2 4 4 2 2 1 1 1 1 1
+
#PIXL
+CIA.pixl <- (pixl.df$Al203 / (pixl.df$Al203 + pixl.df$Na20 + pixl.df$K20 + pixl.df$Cao)) * 100
+hist(CIA.pixl)
+

+
#LIBS
+#do these values differ for each cluster?
+CIA.libs <- data.frame()
+for (i in unq) {
+  arr <- shared_libs[which(umap_clusters == i), 1:8]
+  CIA <- cbind((arr$Al2O3 / (arr$Al2O3 + arr$Na2O + arr$K2O + arr$CaO)) * 100, cluster = i, index = as.numeric(rownames(arr)))
+  CIA.libs <- rbind(CIA.libs, CIA)
+}
+names(CIA.libs) <- c("CIA", "cluster","index")
+
+ggplot(data = CIA.libs, aes(x = CIA)) + 
+  geom_histogram() + 
+  facet_wrap(~as.factor(cluster), scales = "free") 
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+

+
#cluster 1 - 5
+# (1) unimodal peak around 45-50 = virtually no weathering. Higher values > 60 should be looked at more in depth
+# (2) same as (1) but another smaller peak around 55. again no weathering, but higher values should be kept
+# (3) virtually no weathering. I'm wondering if these LIBS samples are benign or what. They have high Ca values and about nothing else
+# (4) Unimodal at 50
+# (5) Left skewed-unimodal around 50, with some higher values
+
+clusters.gt70 <-CIA.libs[which(CIA.libs$CIA >= 70),]
+clusters.gt70.centers <- data.frame()
+for (i in unique(clusters.gt70$cluster)) { #no cluster 3 due to its low values
+  arr <- clusters.gt70[which(clusters.gt70$cluster == i), ]
+  means <- colMeans(shared_libs[arr$index,1:8])
+  clusters.gt70.centers <- rbind(clusters.gt70.centers, means)
+}
+
+names(clusters.gt70.centers) <- names(shared_libs)[1:8]
+gt.70.heatmap <- pheatmap(clusters.gt70.centers, scale = "none", main = "GT60")
+

+
clusters.lt70 <- CIA.libs[which(CIA.libs$CIA < 70),]
+clusters.lt70.centers <- data.frame()
+for (i in unique(clusters.lt70$cluster)) { #no cluster 3 due to its low values
+  arr <- clusters.lt70[which(clusters.lt70$cluster == i), ]
+  means <- colMeans(shared_libs[arr$index,1:8])
+  clusters.lt70.centers <- rbind(clusters.lt70.centers, means)
+}
+
+names(clusters.lt70.centers) <- names(shared_libs)[1:8]
+lt.70.heatmap <- pheatmap(clusters.lt70.centers, scale = "none", main = "LT60")
+

+
grid.arrange(as.ggplot(gt.70.heatmap), as.ggplot(lt.70.heatmap), ncol = 2)
+

+
#really just alumine is in higher quantities. This is an indicator of more potent weathering.
+
+libs_ternary <- shared_libs %>%
+  mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="LIBS", CIA = NA) %>%
+  select(c(13:17)) 
+
+libs_ternary[clusters.gt70$index, 5] <- "CIA>70" 
+libs_ternary[clusters.lt70$index, 5] <- "CIA<70" 
+
+
+pixl_ternary <- shared_pixl %>%
+  mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="PIXL", CIA = "CIA<70") %>%
+  select(c(11:15)) 
+
+
+
+ggtern(libs_ternary, ggtern::aes(x = x, y = y, z = z, color = CIA, shape = value)) + 
+  geom_point() + 
+  geom_point(data = pixl_ternary, aes(x = x, y = y, z = z, color = "PIXL", shape = value)) + 
+   theme_rgbw() + 
+  labs(x="Si+Al",
+       y="Fe+Mg",
+       z="Ca+Na+K", 
+       title = 'LIBS and PIXL ternary clustered by CIA')
+
## Warning in geom_point(data = pixl_ternary, aes(x = x, y = y, z = z, color =
+## "PIXL", : Ignoring unknown aesthetics: z
+

+
#All PIXL data points are < 70
+
+ggtern(libs_ternary, ggtern::aes(x = x, y= y, z = z, color = as.factor(umap_clusters))) + 
+  geom_point() + 
+  theme_rgbw() + 
+  labs(x="Si+Al",
+       y="Fe+Mg",
+       z="Ca+Na+K", 
+      title = "UMAP Clustered on Major Cations")
+

+
+
+

Discussion of results

+
    +
  • There isn’t a huge difference in weathering conditions on cation +densities aside from differing Al2O3, but this is the main polyatomic +ion in question when calculating CIA so it makes sense that it would be +the lead indicator of weathering conditions purely based on the formula. +However, while Ca, Na & K also were calculated it points towards the +Fe + Mg densities on how different weathering might be affected by Fe + +Mg concentrations. In all our heatmaps we saw that the relationship +between Fe and Mg as either in unison or slightly different. One quick +question would be, well, if our Alumine values differ what is the +relationship between Fe and Mg (in terms of %), and does this +matter?

  • +
  • Also, quick thought, but the points that are in the middle of the +ternary plot who have relatively high concentrations of all cation +combinations; what does that say about their mineralogy?

  • +
+
+
+
+

Analysis: Question 3 (Provide short name)

+
+

Question being asked

+

Does the abundance of certain minerals provide any insight on +ternary plots of major cations?

+
+
+

Data Preparation

+
    +
  • In order to check the differences in mineral abundance, I will be +utilizing the lithology dataframe to see if there are any clear cut +differences in mineral abundance between data points, especially +differing that of campaign type and rock type.

  • +
  • The steps for this analysis are very simple. Im going to find the +sizes (i.e total sum of each sample row of minerals) of each sample to +then compare on ternary graphs of major cations, under studied cations +and then see if I can find some correlation between total wt% of samples +to their abundance of minerals. Finding out that there isn’t (as seen at +the bottom) shows that something else is going on.

  • +
+

lithology.df, shared_libs, shared_pixl

+
# Include all data processing code (if necessary), clearly commented
+
+
+

Analysis methods used

+
    +
  • For this last analysis I just wanted to look at unusual elements +compared to their relative abundance of total minerals. There is no +direct reason for the way I split cation nodes, but I am aware that SO3 +and Cl are assocaited with lake ecosystems and that Ti and Cr203 are +very unreactive based on their wt%.

  • +
  • We don’t see any outstanding results from this analysis but we +can see that abundance is related to total wt%. Our r^2 with regular +pearson coefficient is ~27% while Kendall’s non-parametric test is ~20%. +This is an extreme oversimplification and I believe that comparing total +wt% to certain mineral types might actually prove to be substantial and +might aid in prediction tasks later on. With regard to our ternary +plots, while there aren’t many outstanding results, we see that Roubion +had the largest abundance of minerals (13 in total) but that sedimentary +values (who gravitated towards SO3 & Cl concentrations) actually had +the lowest abundance of total minerals. If we were to create some form +of specialized metric that accounted for certain groups of minerals and +then measured that to total wt% and/or cation groups this might prove +interesting.

  • +
+
#size of points for Lithology is dependent on the abundance of each mineral within the 16 samples ON a ternary plot. Hover over, can we see which
+
+#characterize size of points by their total abundance per sample at first. 
+sizes <- rowSums(lithology.matrix)
+
+ggtern(pixl_ternary, ggtern::aes(x = x, y= y, z = z)) + 
+  geom_point(aes(size = sizes)) + 
+  theme_rgbw() +
+  labs(x="Si+Al",
+       y="Fe+Mg",
+       z="Ca+Na+K", 
+       title = "Ternary Plot with major Cations, sized by mineral abundance")
+

+
#no real defined pattern but from our last plot and the point that tends inward is just another reason behind the idea that higher wt% likely have larger variety of minerals. 
+#im going to alter the Ca+Na+K just to see if there is an effect on other features
+
+
+
+#look at unusual elements
+revised_pixl_ternary <- pixl.df %>%
+  mutate(x=(S03+Cl)/100,y=(P205+Mno)/100,z=(Ti02+Cr203 )/100) %>%
+  select(c(20:22)) %>%
+  mutate(sizes = sizes)
+
+ggtern(revised_pixl_ternary, ggtern::aes(x = x, y= y, z = z)) + 
+  geom_point(aes(size = sizes, color = pixl.df$type)) + 
+  theme_rgbw() +
+  labs(x="S03+Cl",
+       y="P205+MnO",
+       z="Ti02+Cr203", 
+       title = "Ternary Plot of usually undocumented Cations")
+

+
hist(rowSums(pixl.df[,2:14]))
+

+
hist(sizes)
+

+
plot(x = rowSums(pixl.df[,2:14]), y = sizes)
+

+
cor.test(x = rowSums(pixl.df[,2:14]), y = sizes, method = 'kendall') #non-parametric cor.test
+
## Warning in cor.test.default(x = rowSums(pixl.df[, 2:14]), y = sizes, method =
+## "kendall"): Cannot compute exact p-value with ties
+
## 
+##  Kendall's rank correlation tau
+## 
+## data:  rowSums(pixl.df[, 2:14]) and sizes
+## z = 0.97775, p-value = 0.3282
+## alternative hypothesis: true tau is not equal to 0
+## sample estimates:
+##      tau 
+## 0.196399
+
+
+

Discussion of results

+
    +
  • Overall, our results show the abundance of distinct clusters +especially that of high CaO concentration, MgO + FeO-T patterns , SiO2 +patterns and overall slight differences in other metrics that might +affect classification of LIBS data in particular with correspondence to +sampled PIXL values. Cation abundance in clusters is a powerful +classifier and can be an indicator of rock morphology. I think the +clustering of the LIBS data points and plotting PIXL collected samples +on top is an important first step of finding how we can connect more +data to classify Mars surface. LIBS is the most important and most +detail heavy as it gives us insight into abnormal values as well as +where on the surface does minerality change. From now forward, +clustering of LIBS seems arbitrary and redundant and instead I think +focus onto how Lithology connects to PIXL and how ultimately the oxide +concentrations of LIBS can explain what minerals might be present. In +addition comparing elemental chemistry with the chemical formulas of +minerals will also prove useful as we can see not only the differences +in all types of mineral formulas and how they differ. This might provide +insight into how the rocks studied ended up how they are as well as a +proxy for potential life? This is not as realistic but as the rover +moves to different spots we might see clear cut results that are +indicative of life. Finally, and as to not be redundant, but finding +direct correlations between certain compounds and the abundance, and +thus variance, of minerals within each sample might give us insight into +how we can predict what exactly LIBS is analyzing.
  • +
+
+
+
+

Summary and next steps

+ +
+
+

1 - Riemann sum ( n = 1, last element) |mineral element/compound% - +PIXL element/compound%| #these are division signs just slashes

+
+

Notes

+
    +
  • see which lat and long correspond to similar values of our +samples. Maybe we will see more. Apply clustering to data. Using AP +clustering find clusters and see the overlap of prior PCA plots +(Specifically campaign) and heatmaps to analyze concentration of +clusters

  • +
  • SHERLOC has been used recently at Cheyava falls. What signs in +both our samples, and in our LIBS data show that chemical reactions +possible for life have occured? #https://www.nasa.gov/missions/mars-2020-perseverance/perseverance-rover/nasas-perseverance-rover-scientists-find-intriguing-mars-rock/

  • +
  • I think retrieving the chemical formulas for pure types of +minerals and comparing the weight % of all different elemental features +and seeing to what amount are these minerals different than their pure +countertype.

  • +
  • Iron and P were found on these spotchy rocks at Cheyava Falls. +Can we track the amount of iron and P, or chemically stimulating mutual +elements that will also provide evidence of life?

  • +
  • what differentiates these between abiotic and biotic (coming from +animals, decomposed matter?)

  • +
  • what is the importance of Sulfur trioxide on river beds?

  • +
  • silicon dioxide?

  • +
  • why are Ferric(is) oxides and Magnesium oxide compatible? What +about their relationship exposes their variance in the same +direction?

  • +
  • clusters both exhibit small proportions of Na2O, TiO2, K2O, Al2O3 +& CaO

  • +
  • SiO2, ferric and magnesium oxide are of interest. SiO2 always +seems to have a specific cluster with a very high amount of +SiO2

  • +
  • can we get access to RIMFAX data? (Wierzchos et +al. 2012)

  • +
  • fissures might show promise of life. endolithic colonies usually +go underground to escape solar radiation (however, lots of UVB and +UVC)

  • +
+
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/StudentNotebooks/Assignment04/morawn_assignment4.Rmd b/StudentNotebooks/Assignment04/morawn_assignment4.Rmd new file mode 100644 index 0000000..d170f72 --- /dev/null +++ b/StudentNotebooks/Assignment04/morawn_assignment4.Rmd @@ -0,0 +1,287 @@ +--- +title: "DAR F24 Perserverance Analyses Notebook 4" +author: "Nicolas Morawski" +date: "`r Sys.Date()`" +output: + html_document: + toc: yes + pdf_document: + toc: yes +subtitle: "DAR Project Name: Mars" +--- +```{r setup, include=FALSE} + +# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!! +# This section install packages if they are not already installed. +# This block will not be shown in the knit file. +knitr::opts_chunk$set(echo = TRUE) + +# Set the default CRAN repository +local({r <- getOption("repos") + r["CRAN"] <- "http://cran.r-project.org" + options(repos=r) +}) + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} + +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} + +if (!require("caret")) { + install.packages("caret") + library(caret) +} + +if (!require("ggplot2")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("knitr")) { + install.packages("knitr") + library(knitr) +} + +if (!require("BBmisc")) { + install.packages("BBmisc") + library(BBmisc) +} + +if (!require("ggtern")) { + install.packages("ggtern") + library(ggtern) +} +``` +## Weekly Work Summary + +* RCS ID: morawn +* Project Name: Mars DAR Team +* Summary of work since last week + + * Revised previous analyses + * Started working with LIBS Earth calibration samples + * Worked extensively on Mars 2D App + +* Github issues assigned/addressing/complete: + + * Creating Wireframe for 2d App + * Create a Comprehensive LIBS notebook + * Find Earth references for Ternary plots (complete) + * Convert Campfire to 2D standalone app + * Slides for Dr. Rogers + +* List of presentations, papers, or other outputs + + * Figma link to 2D App Wireframe: https://www.figma.com/proto/x5MAqTC3OzjsZbNbaVxc17/Mars-Mission-Minder?node-id=0-1&t=OA3HE7B1A3SLeda5-1 + +## Personal Contribution + +* Collected/Worked with Earth references +* Worked on App Wireframe + +## Analysis: Question 1 (Reworking Previous LIBS Analyses) + +### Question being asked + +I have found my previous analyses to not be very helpful, but that was because I did not manipulate the data correctly. After making the necessary changes, are there any meaningful conclusions I can make? + +### Data Preparation + +Just like in my previous notebook, I am using the LIBS dataset, primarily the elemental compositions. After talking with Dr. Erickson, I am now clustering the data via the major cations, similarly to the ones depicted in the previous notebooks ternary plot. The three major dataframes depicted are described as: +* libs_ternary/libs_ternary_clustered: The LIBS data clustered into the different groupings of cations +* libs_loc_ternary: Same as the above, but includes location data, so the samples can be plotted by latitude and longitude +* pixl_ternary: The PIXL samples manipulated to have the same predictors as the LIBS dataframes + +I use k-means to cluster the LIBS data. + +```{r, result01_data} +# Include all data processing code (if necessary), clearly commented +libs_data <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") +pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") +libs_trim <- libs_data %>% select(c(SiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) +libs_loc <- libs_data %>% select(c(SiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O, lat, lon)) + +#PIXL data, with identically reflected compositions +new_pixl_trim <- pixl.df %>% + dplyr::select(c("Na20","Mgo","Al203","Si02", "K20","Cao","FeO-T", campaign, type)) %>% + rename("Na2O"="Na20","MgO"="Mgo","Al2O3"="Al203","SiO2"="Si02","K2O"="K20", + "CaO"="Cao","FeOT"="FeO-T") + +#Recommended +libs_ternary <- libs_trim %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>% + drop_na() + +libs_loc_ternary <- libs_loc %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>% + drop_na() + +pixl_ternary <- new_pixl_trim %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>% + drop_na() + +set.seed(10) +k <- 4 +tern.km <- kmeans(libs_ternary, k) + +libs_ternary_clustered <- cbind(libs_ternary, cluster=as.factor(tern.km$cluster)) +libs_loc_ternary <- cbind(libs_loc_ternary, cluster=as.factor(tern.km$cluster)) +tern_clusters <- libs_ternary_clustered$cluster +``` + +This little snippet is to aid in the creation of the PIXL data clustering/heatmaps. + +```{r} +new_pixl_ternary <- pixl_ternary %>% + select(-c(campaign, type)) +``` + +### Analysis: Methods and results + +The following code creates two plots; the first plot presents the clustered LIBS data in a ternary plot, along with the PIXL samples. The second plot is the correctly-oriented location data of all of the LIBS samples, along with the given clusters. + +```{r, result01_analysis} +ggtern(data= libs_ternary_clustered, mapping=ggtern::aes(x=x,y=y,z=z)) + + geom_point(aes(color=tern_clusters)) + #tern_clusters + theme_rgbw() + + labs(title="Perserverance LIBS Ternary Plot, PIXL data included", + subtitle="PIXL data clustered by campaign", + x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K") + + geom_point(data=pixl_ternary, ggtern::aes(x=x,y=y,z=z, shape=campaign)) +# Clustered LIBS data, graphed by loc +ggplot(libs_loc_ternary, aes(x=lon, y=lat, colour=cluster)) + + geom_point() + + ggtitle("Clustered LIBS Data Graphed by Location") + +``` +I was originally going down the route of exploring more with PIXL samples, but have since moved on. Here, I was classifying the PIXL data by the clusters I found when clustering the LIBS data. the idea was to see if I could uncover any other hidden relationships between the samples. I also then graph the PIXL samples by their cation percentages. I do not include Ca+Na+K, as exhibited in the above ternary plot, all of the samples have a very low percentage of these cations. +```{r} +# Normalize/scale training/test data +scaler <- preProcess(libs_ternary, method = c("center", "scale")) +train <- predict(scaler, libs_ternary) +test.pixl <- predict(scaler, new_pixl_ternary) + +# KNN model +classtrain <- as.factor(libs_ternary_clustered$cluster) +train.df <- cbind(train,classtrain) +model<- knn3(classtrain ~ ., data = train.df, k = 40) +pixl.class <- predict(model,test.pixl, type="class") +pixl.predicted <- cbind(pixl.df,pixl.class) +#IMPORTANT: Use for heatmap +pixl.classified.scaled <- cbind(test.pixl, pixl.class) + +# PIXL K-means +set.seed(10) +k <- 4 +km2 <- kmeans(new_pixl_ternary,k) +cluster <- km2$cluster +pixl.kmean <- cbind(pixl_ternary,cluster) + +# Heatmaps. Plenty of room to change/fix/adjust. +ggplot(pixl.kmean, aes(x=x,y=y,colour=as.factor(cluster))) + + geom_point() + + labs(title="Perserverance PIXL Ternary Plot, Si+Al versus Fe+Mg", + x="Si+Al", + y="Fe+Mg") + +heatmap.data = data.frame(matrix(nrow = 0, ncol = ncol(km2$centers))) +colnames(heatmap.data) = colnames(km2$centers) +for (x in 1:4) { + test.df <- pixl.classified.scaled %>% filter(pixl.class == x) + if (dim(test.df)[1] != 0) { + test.df<- test.df[ , !(names(test.df) %in% c("pixl.class"))] + heatmap.data[nrow(heatmap.data)+1,] <- colMeans(test.df) + } +} + +#pheatmap(heatmap.data,scale="none", main="Clustered Heatmap from KNN Classification") +``` +### Discussion of results + +Of the four determined LIBS clusters, three of them show clear correlations (Clusters 1,3,4). These three clusters exhibit the inverse relationship between the concentration of the Iron + Magnesium cations vs. the Silicon + Aluminum cations. Cluster 2 is quite interesting though, as it is the only cluster that shows high concentrations in Calcium + Sodium + Potassium. When comparing these clusters to their geographic location, there is no set clustering. Cluster 3 is primarily located in the later half of Perseverance's journey, while Cluster 4 is mainly in the first half. Interestingly enough, again, Cluster 2 differs greatly from the rest of the clusters, being present mainly in the middle part of Perseverance's journey. It would be interesting to do more digging into the LIBS samples that make up Cluster 2 and see what other conclusions/analyses can be made. + +## Analysis: Question 2 (Earth Samples Data) + +### Question being asked + +We have discussed the importance of having an Earth benchmark for the LIBS data. What analyses/comparisons can we make, and how does that help further understand the mineralogy of Mars? + +### Data Preparation + +The source of this data is a pair of reports shared by Dr. Brenda Thomson. I spent the time to translate all of the data into a Rds file that everyone on the team can use. I filtered the dataset to only include the same predictors that I have been using in all of my analyses and plots. + +```{r, result02_data} +libs_target_data <- readRDS("~/DAR-Mars-F24/StudentData/LIBS_calibration_targets.Rds") +libs_target_trim <- libs_target_data %>% select(c(Description,Si, Al, Fe, Mg, Ca, Na, K)) + +#Recommended +libs_target_ternary <- libs_target_trim %>% + mutate(x=(Si+Al),y=(Fe+Mg),z=(Ca+Na+K)) %>% + select(-c(Si,Al,Fe,Mg,Ca,Na,K)) %>% + drop_na() +SampleNames <- libs_target_ternary$Description +``` + +### Analysis: Methods and Results + +This is just a ternary plot of the LIBS calibration targets, along with the name of each sample. + +```{r, result02_analysis} +ggtern(libs_target_ternary, ggtern::aes(x=x,y=y,z=z)) + + geom_point(aes(color=SampleNames)) + + theme_rgbw() + + labs(title="LIBS Earth Calibration Targets", + x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K") +``` + +### Discussion of results + +At the given moment, I do not have any further analyses other than the plot I made. I could not figure out any good analyses to perform to compare to the LIBS data. In the future, I will work more on this topic so it can be presented to Dr. Rogers. + +## Analysis: Question 3 (2D App Mockup) + +### Question being asked + +How can present all of the analsyses and graphs we are making in a standalone 2D app? + +### Discussion of results + +This part of my report includes zero R code, but is rather for my work on writing up wireframes for the Mars Mission Minder app. At the time of submission, I have received a lot of critiques and feedback, which will be implemented in the future. + +https://www.figma.com/proto/x5MAqTC3OzjsZbNbaVxc17/Mars-Mission-Minder?node-id=0-1&t=OA3HE7B1A3SLeda5-1 + +## Summary and next steps + +All in all, I spent a good bit of time revisiting and revising my old work, along with making some new discoveries. The next steps for me are to continue working on the Mars Mission Minder App, and hopefully have most of the concept/UI flushed out by Oct. 30. Other than that, I need to get on the same page as my other LIBS team members and work further on more LIBS analyses. + diff --git a/StudentNotebooks/Assignment04/morawn_assignment4.pdf b/StudentNotebooks/Assignment04/morawn_assignment4.pdf new file mode 100644 index 0000000..bddd8c2 Binary files /dev/null and b/StudentNotebooks/Assignment04/morawn_assignment4.pdf differ diff --git a/StudentNotebooks/Assignment04/walczd3-biweekly-10-08-2024.Rmd b/StudentNotebooks/Assignment04/walczd3-biweekly-10-08-2024.Rmd new file mode 100644 index 0000000..4f49ea0 --- /dev/null +++ b/StudentNotebooks/Assignment04/walczd3-biweekly-10-08-2024.Rmd @@ -0,0 +1,513 @@ +--- +title: "DAR F24 Biweekly 1" +author: "David Walczyk" +date: "`r Sys.Date()`" +output: + pdf_document: + toc: yes + html_document: + toc: yes +subtitle: "DAR Project Name: Mars" +--- + +## Packages Load In + +```{r} +# Set the default CRAN repository +local({r <- getOption("repos") + r["CRAN"] <- "http://cran.r-project.org" + options(repos=r) +}) + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} + +if (!require("ggplotify")) { + install.packages("ggplotify") + library(ggplotify) +} + +if (!require("car")) { + install.packages("car") + library(car) +} +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} +if (!require("ggtern")) { + install.packages("ggtern") + library(ggtern) +} +if (!require("umap")) { + install.packages("umap") + library(umap) +} +if (!require("gridExtra")) { + install.packages("gridExtra") + library(gridExtra) +} +if (!require("stringdist")) { + install.packages("stringdist") + library(stringdist) +} +``` + +## Data Load In +```{r} + +#-------------LIBS------------------- +# Load the saved LIBS data with locations added +libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") +libs.std_dev <- libs.df %>% + select((c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) +libs.df <- libs.df %>% + select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) + +# Convert the points to numeric +libs.df$point <- as.numeric(libs.df$point) + +# Review what we have +summary(libs.df) + +#----------PIXL---------------------- +# Load the saved PIXL data with locations added +pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") + +pixl.df +# Convert to factors +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) + +# Review our dataframe +summary(pixl.df) + +#----------SHERLOC---------------------- +# Read in data as provided. +sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds") + +# Clean up data types +sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral) +sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], + as.numeric) +# Transform NA's to 0 +sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0) + +# Reformat data so that rows are "abrasions" and columns list the presence of minerals. +# Do this by "pivoting" to a long format, and then back to the desired wide format. + +sherloc_long <- sherloc_abrasion_raw %>% + pivot_longer(!Mineral, names_to = "Name", values_to = "Presence") + +# Make abrasion a factor +sherloc_long$Name <- as.factor(sherloc_long$Name) + +# Make it a matrix +sherloc.matrix <- sherloc_long %>% + pivot_wider(names_from = Mineral, values_from = Presence) + +# Get sample information from PIXL and add to measurements -- assumes order is the same + +sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix) + +# Review what we have +summary(sherloc.df) + + +# Load the saved lithology data with locations added +lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds") + +# Cast samples as numbers +lithology.df$sample <- as.numeric(lithology.df$sample) + +# Convert rest into factors +lithology.df[sapply(lithology.df, is.character)] <- + lapply(lithology.df[sapply(lithology.df, is.character)], + as.factor) + +# Keep only first 16 samples because the data for the rest of the samples is not available yet +lithology.df<-lithology.df[1:16,] + +# Create a matrix containing only the numeric measurements. The remaining features are metadata about the sample. +lithology.matrix <- sapply(lithology.df[,6:40],as.numeric)-1 + +# Review the structure of our matrix +str(lithology.matrix) +``` + + +## BiWeekly Work Summary + +**NOTE:** Follow an outline format; use bullets to express individual points. + +* RCS ID: **walczd3** +* Project Name: **Mars** +* Summary of work since last week + +_Last couple of weeks I have been working on clustering LIBS specificially using UMAP in place of K-means and mapping those clusters to see if in PCA there are certain PIXL samples aligned with the clusters produced. In addition I worked to see if their is a way to get RIMFAX data, but to no specific success as the permitivity of rocks is estimated from the velocity of the RIMFAX radar. This was very confusing and out of my skill range, so I moved towards finding a metric that can find the difference between observed element proportions and the expected elemental proportion of minerals. + +* Summary of github commits + + * include branch name(s) + * include browsable links to all external files on github + * Include links to shared Shiny apps + +* List of presentations, papers, or other outputs + + * Include browsable links + +* List of references (if necessary) +* Indicate any use of group shared code base +* Indicate which parts of your described work were done by you or as part of joint efforts + +* **Required:** Provide illustrating figures and/or tables + +## Personal Contribution + +* Clearly defined, unique contribution(s) done by you: code, ideas, writing... +* Include github issues you've addressed if any + + + +## Analysis using ChatBS of aqueous minerals + +### Question being asked + +_Can ChatBS give us insight into whether or not minerals identified by the SUPERCAM are aqueous? Aqueous minerals are defined as minerals that are formed by chemical alteration of pre-existing minerals or from perciptation out of a solution (solution in this case likely means water, but we are unaware of the salinity)._ + + +* BRANCH #139 +```{r} +#0's indicate no aqueous or partial. Phosphate especially can come from the sediment or from rock weathering. Perchlorates are formed from oxidized Chlorates which come from naturally occuring Chlorides. Same with carbonate. +#amorphous silicates are not defined in their strucutre because sometimes volcanic cooling can lead to amorphous silicates. +#Organic matter is placed as zero but ask about this. (0) +#Other hydrated phases? (1) Implying aqueous alteration. + +#self made containing more collective groups of minerals (and formulas) +#e.g: SulfateS, CarbonateS, +#Ilmenite and Ulvospinel represent Fe-Ti Oxides +colnames(lithology.matrix) +minerals_aq <- read_csv("~/DAR-Mars-F24/Data/Minerals_AQ.csv") +minerals_aq <- minerals_aq %>% + mutate( Mineral = as.factor(Mineral), Formula = as.factor(Formula)) + +minerals_aq +aq_wide <- minerals_aq %>% + pivot_wider(names_from = Mineral, values_from = `Aq check`, values_fill = 0) + +pheatmap(aq_wide[,-1], cluster_rows=F, cluster_cols=F, main = "Heatmap of Aq (red) vs. Non-aq minerals") + + +#are their any patterns between aqueous and non-aqueous minerals and pixl data. +colnames(lithology.matrix) +#for lithology matrix +aqueous.lith <- c(0,0,0, + 0,.5,1, + 0,1,1, + 0,1,1, + 1,1,1, + 1,1,1, + 1,0,1, + 1,1,0, + 1,1,0, + 1,0,1, + 1,0,0, + 0,0) + +#for sherloc - column one which is name +colnames(sherloc.matrix)[-1] +aqueous.sher <- c(0,1, + 1,1, + 1,1, + 1,0, + 0,1, + 1,0, + 0,1, + 1,1, + 0,0, + .5,1, + 0,1, + 1,1, + 0,0, + 1,1, + 1,1, + 0,0, + 0,1, + 0) + +names <- sherloc.matrix[,1] +sherloc.matrix <- sherloc.matrix[,-1] #be wary of this +aq <- sherloc.matrix[,which(aqueous.sher > 0)] +non_aq <- sherloc.matrix[,which(aqueous.sher == 0)] +aq.plot <- pheatmap(aq,scale="none", main = "Aqueous SHERLOC minerals" ,cluster_rows=F, cluster_cols=F) +nonaq.plot <- pheatmap(non_aq, scale = "none", main = "Non-Aqueous SHERLOC minerals",cluster_rows=F, cluster_cols=F) + +grid.arrange(as.ggplot(aq.plot), as.ggplot(nonaq.plot), ncol = 2) + +heatmap_func <- function(data,title, scale = T, ROW) { + #cols means do you want to print col names. No for second row of aq + if (scale == F) { + return (pheatmap((data), scale = "none", main = title, cluster_rows = F, cluster_cols = F, labels_row = ROW, fontsize_col = 10)) #unname(data) to remove cols + } else { + return (pheatmap(data, scale = "column", main = title, cluster_rows = F, cluster_cols = F, labels_row = ROW)) + } +} + +#see similarities on PIXL data in aqueous minerals. Na20 is arbitrary, any column will do +pixl.first <- pixl.df[c(1:5,8,10,12,14,16),] #first occurrence - technically all different samples +pixl.dups <- pixl.df[5:15,] #duplicates +pixl.unique <- pixl.df[c(1:4,16),] #unique samples - no duplicates whatesoever + +pixl.first.plot <- heatmap_func(pixl.first[,2:14], "All first occurences (scaled cols)", ROW = pixl.first$name) +#pixl.dups.plot <- heatmap_func(pixl.dups[,2:14], "Duplicates", ROW = pixl.dups$name) +#pixl.unique.plot <- heatmap_func(pixl.unique[,2:14], "Unique", ROW = pixl.unique$name) + + +aq.first <- aq[pixl.first$sample, ] +aq.dups <- aq[pixl.dups$sample, ] +aq.unique <- aq[pixl.unique$sample,] + +aq.first.plot <- heatmap_func(aq.first, "SHERLOC (Aqueous)", scale = F, ROW = pixl.first$type) +#aq.dups.plot <- heatmap_func(aq.dups, "Duplicates", scale = F,ROW = pixl.dups$name) +#aq.unique.plot <- heatmap_func(aq.unique, "Unique", scale = F,ROW = pixl.unique$name) + +nonaq.first <- non_aq[pixl.first$sample, ] +nonaq.dups <- non_aq[pixl.dups$sample, ] +nonaq.unique <- non_aq[pixl.unique$sample,] + +nonaq.first.plot <- heatmap_func(nonaq.first, "SHERLOC (Non-Aqueous)", scale = F,ROW = pixl.first$type) +#nonaq.dups.plot <- heatmap_func(nonaq.dups, "Duplicates", scale = F,ROW = pixl.dups$name) +#nonaq.unique.plot <- heatmap_func(nonaq.unique, "Unique", scale = F,ROW = pixl.unique$name) + + +p <- grid.arrange(as.ggplot(pixl.first.plot), top = "PIXL w/out duplicates just First Occurences") +s <- grid.arrange(as.ggplot(aq.first.plot), as.ggplot(nonaq.first.plot), ncol = 2, top = "SHERLOC for aqueous & non-aqueous First Occurences") + +#ggsave(file = "SHERLOC_AQ_NONAQ_campaign.png", s,width = 8, height = 8, dpi = 150, units = "in") +#ggsave(file = "PIXL_firstOccurence.png_campaign.png", p,width = 8, height = 8, dpi = 150, units = "in") + + +#Basic Findings +#Delta front shows the only concentrations of Clay minerals, apatite (aside from 1 .25 sample at Roubion), halite (.25 also at Roubion) + Mg & Mg-Fe Sulfate for aqueous minerals. For non-aqueous minerals sulfate-organic matter, chromite, ilmenite, zircon and spinels are shown. + +#High prominence of K2O in crater floor. Estimate is high molar fraction of K-spar. High amounts of SiO2 (quartz and amorphous silicates, might have to change amorphous to aq). Ilmenite, zircon and spinels are only classified as delta front while non-aqueous. Highest MgO is associated with Mg-Fe Carbonates. + +``` + + + +## Finding predictive mineral-oxide relationships + +### Question being asked + +_Is there a way to account for present minerals using just LIBS and PIXL collected oxides?_ + + +*BRANCH #135 - 'mineral_wts.csv' was found in python using the github repo: https://github.com/jbjacob94/min-formulas/tree/main. This repo accounted for more in depth details in calculating MOLAR FRACTIONS of different minerals of feldspar, olivine and pyroxene. These are found using oxides for every sample in LIBS and PIXL specified in 'mineral_wts' by an ID column. LIBS does not contain all the oxides that are present in PIXL and thus some values are not perfectly representative of their respective + +![Example Molar Composition Ternary Plot for Feldspar groups](../DAR-Mars-F24/Resources/Ternary-phase-diagram-of-feldspar-Endmember-and-solids-solution-not-necessarily-stable.png) + + +```{r} +set.seed(4) + +name_order <- c("Na2O", "SiO2", "MgO", "Al2O3", "P2O5", "SO3", "Cl", "K2O", "CaO", "TiO2", "Cr2O3", "MnO", "FeO-T") +pixl_oxides <- pixl.df[,2:14] +names(pixl_oxides) <- name_order + +libs_oxides <- cbind(libs.df[,c(12,6,10,8)], P2O5 = 0, SO3 = 0, Cl = 0, libs.df[,c(13,11,7)], Cr2O3 = 0, MnO = 0, `FeO=T`=libs.df[,9]) +names(libs_oxides) <- name_order + +#what are the oxide concentrations of our samples with respect to the libs total +libs_long <- libs_oxides[,-c(5:7, 11,12)] %>% + pivot_longer(cols = names(libs_oxides[,-c(5:7,11,12)]), names_to = "Oxide", values_to = "Values") +log_vals <- log(libs_long$Values) +log_vals[!is.finite(log_vals)] <- 0 +ggplot(libs_long, aes(x = log_vals)) + + geom_histogram() + + facet_wrap(~Oxide) + +#None pass +#df_check <- libs_oxides[,-c(5:7, 11,12)] +#for (i in seq(length(names(df_check)))) { +# print(names(df_check)[i]) +# print(shapiro.test(df_check[,i])) +#} + +#plot feldspar ternary + + +mineral_wts = read.csv("~/Mars work - walczd3/mineral_wts.csv") +mineral_wts["ID"] = c(rep("PIXL", 16), rep("LIBS", nrow(libs_oxides))) + + + +ggtern(mineral_wts[which(mineral_wts$ID == "PIXL"),], ggtern::aes(x = Xab, y = Xor, z = Xan, color = pixl.df$name)) + + geom_point() + + theme_rgbw() +sherloc.df[c(1,2,3,7,13,15,16),] #possible plagioclases + +ggtern(mineral_wts[which(mineral_wts$ID == "LIBS"),], ggtern::aes(x = Xab, y = Xor, z = Xan, color = ID)) + + geom_point(size = 2, pch = 21) + + geom_point(data = mineral_wts[which(mineral_wts$ID == "PIXL"),], ggtern::aes(x = Xab, y = Xor, z = Xan, color = pixl.df$type), size = 4 ) + + theme_rgbw() + + labs(x="(Ab)", #NaAlSi3O8 + y="(O)", #KAlSi3O8 + z="(An)", #CaAl2Si2O8 + title = "Feldspar Ternary Graph") + +ggtern(mineral_wts[which(mineral_wts$ID == "LIBS"),], ggtern::aes(x = Xfo, y = Xfa, z = Xteph, color = ID)) + + geom_point(size = 2, pch = 21) + + geom_point(data = mineral_wts[which(mineral_wts$ID == "PIXL"),], ggtern::aes(x = Xfo, y = Xfa, z = Xteph, color = pixl.df$type), size = 4 ) + + theme_rgbw() + + labs(x="(Fi)", + y="(Fa)", + z="(Teph)", + title = "Olivine Ternary Graph") + + +pheatmap(mineral_wts[c(1,2,3,7,13,15,16),1:3], scale = "none", main = "Feldspar Molar Fractions - PIXL", cluster_rows = F, labels_row = as.character(unlist(pixl.df[c(1,2,3,7,13,15,16), "type"]))) + +as.character(unlist(pixl.df[c(1,2,3,7,13,15,16), "type"])) + +#since only a few of these datapoints are possible we can infer that samples within the ranges specified in the graph above are considerate for feldspar. + +#what does the feldspar molar fraction tell us about samples, does it indicate feldspar being there? +#feldspar +#Three types of feldspar: Xan = Anorthite (CaAl2Si2O8), Xab = Albite (NaAlSi3O8), Xor = orthoclase (KAlSi3O8). Xor is the only type not classified as plagioclase +feldspar_check <- cbind(pixl.df, sherloc.matrix[,which(colnames(sherloc.matrix) %in% c("Plagioclase", "Feldspar"))], mineral_wts[1:16,1:3]) +feldspar_minerals <- pheatmap(feldspar_check[,20:21], scale= "none", cluster_rows = F) +feldspar_oxides <- pheatmap(feldspar_check[,c(2:14,22:24)], scale = "column", cluster_rows = F, cluster_cols = F, labels_row = pixl.df$type) +grid.arrange(as.ggplot(feldspar_minerals), as.ggplot(feldspar_oxides), ncol = 2, top = "Feldspar based clusters") + +#we see that there is no correlation between Albite (Ab) and Na2O oxides; Anthorite (An) and CaO oxides, but there is positive correlation between K2O and Orthoclase (O). +Ab.r2 <- cor.test(feldspar_check$Na20, feldspar_check$Xab) #r2 = .18, p =.4843 +An.r2 <- cor.test(feldspar_check$Cao, feldspar_check$Xan) #r2 = .21, p = .4219 +O.r2 <- cor.test(feldspar_check$K20, feldspar_check$Xor) #highly correlation +plot(feldspar_check$K20, feldspar_check$Xor, xlab = "K2O", ylab = "Xor", main = "K2O wt% vs. Orthoclase (X) (p < .05)") +text(.1, .11, paste("R2 =",round(O.r2[["estimate"]][["cor"]],3)), col = "red") +abline(lm(feldspar_check$Xor~feldspar_check$K20), col = "blue") + + +#olivine +#three olivines analyzed: Xfo = forestetrite (Mg2SiO4), Xfa = fayalite (Fe2SiO4), Xteph = Tephroite (Mn2SiO4) +olivine_check <- cbind(pixl.df, sherloc.matrix[,13], mineral_wts[1:16,5:7]) +olivine_oxides <- pheatmap(cbind(as.data.frame(scale(olivine_check[,c(2:14)])), olivine_check[,21:23], olivine = olivine_check[,20]), scale = "none", cluster_rows = F, cluster_cols = F) + + +Fo.r2 <- cor.test(olivine_check$Mgo, olivine_check$Xfo) +Fa.r2 <- cor.test(olivine_check$`FeO-T`, olivine_check$Xfa) +Teph.r2 <- cor.test(olivine_check$Mno, olivine_check$Xteph) + +par(mfrow=c(1,2)) +plot(olivine_check$`FeO-T`, olivine_check$Xfa, xlab = "FeO-T", ylab = "Xfa", main = "FeO-T wt% vs. Fayalite (X) (p < .05)") +text(26, .8, paste("R2 =",round(Fa.r2[["estimate"]][["cor"]],3)), col = "red") +abline(lm(olivine_check$Xfa ~ olivine_check$`FeO-T`), col = "blue") + +plot(olivine_check$Mgo, olivine_check$Xfo, xlab = "MgO", ylab = "Xfo", main = "MnO wt% vs. Forestertrite (X) (p < .05)") +text(5,.5, paste("R2 =",round(Fo.r2[["estimate"]][["cor"]],3)), col = "red") +abline(lm(olivine_check$Xfo~olivine_check$Mgo), col = "blue") + +#Olivine Iron oxides and Magnesium can indicate the type of olivine present but are they indicative of minerals that are classified as present. +#use Welch's t-test +Olivine_present <- olivine_check[which(olivine_check$Olivine > 0),] +Olivine_not <- olivine_check[which(olivine_check$Olivine == 0),] + +t.test(Olivine_present$Mgo, Olivine_not$Mgo, alternative = "two.sided", var.equal = FALSE) #p > .05 + +t.test(Olivine_present$`FeO-T`, Olivine_not$`FeO-T`, alternative = "two.sided", var.equal = FALSE) #p > .05 + +#NOT INDICATIVE +#rip + + +#Potassium dioxide (K20) is a powerful indicator of orthoclase a.k.a k-spar, but can it indicate plagioclase or feldspar? +Plagioclase_present <- feldspar_check[which(feldspar_check$Plagioclase > 0),] +Plagioclase_not <- feldspar_check[which(feldspar_check$Plagioclase == 0),] + +t.test(Plagioclase_present$K20, Plagioclase_not$K20, alternative = "two.sided", var.equal = FALSE) #NOT INDICATIVE + +Feldspar_present <- feldspar_check[which(feldspar_check$Feldspar > 0),] +Feldspar_not <- feldspar_check[which(feldspar_check$Feldspar == 0),] +t.test(Feldspar_present$K20, Feldspar_not$K20,alternative = "two.sided", var.equal = FALSE) #INDICATIVE!! p < .05 +#k-spar is highly responsible for determining the presence of Feldspar. + +names(feldspar_check) + +logistic_feldspar <- glm(as.factor(Feldspar)~., data = pixl.df[]) +``` + + + +### Discussion of results + + +_Overall, my analyses were to compare the differences between aqueous and non-aqueous minerals and their oxide concentrations as well as to look at specific igneous minerals whose molar fractions were calculated (feldspar and olivine), and to compare whether certain cations are able to predict their abundance. This proved somewhat successful as we were able to confirm that Potassium Dioxide (K2O) is an important indicator of Feldspar rocks presence. Other minerals molar fractions will have to be calculated to look at the correlation of certain types of minerals to specific oxides. Using both parametric and non-parametric statistical tests however, we will be able to assess relationships between all types of different oxides and mineral presence._ + +_For now I will finish working on my SHERLOC metric which I am just having trouble connecting to mineral formulas, as well as finding all possible combinations of minerals formulas as they are often not rigorously fixed. Some others like amorphous silicates and organic matter do not have set formulas by nature and thus are unapproachable as of now. In addiiton before we meet with Dr. Rogers, aside from the SHERLOC metric, I would like to connect all possible combinations of correlations and hopefully be able to get more molar fractions of minerals in order to predict mineral presence._ + + + + + + +***IGNORE THIS FOR NOW*** + +## Finding alternate metric for SHERLOC data using relationship between oxides and base composition of minerals +### Question being asked + +_What does this proposed alternate metric for SHERLOC tell us about the data?_ + + +* BRANCH #135 - for this analysis I will just explain the metric than in the next notebook I will be responsible for providing a working example. Further analysis above has shown me that more comparisons are needed to understand the relationship between elemental chemistry and mineral composition. However, below I explain what I am considering now and will actually implement. + +_(1) Firstly, background information. Each mineral in SHERLOC/Lithology has a fixed chemical formula with a fixed molecular mass (g/mol), aside from organic matter and other hydrated phases. A mole just represents the amount of substance. Its use is analagous to us going to the grocery store and buying a dozen eggs; atoms can be considered the eggs in this scenario while mole represents 1 dozen: 12 eggs ratio. Nevertheless, within minerals their are set amount of elements that too have a fixed molecular mass (g/mol) * however many moles of atoms are present (the subscript represented in a chemical formula)._ + +_(2) Second, Converting Oxides to element wt%. PIXL and LIBS gives us oxides (e.g: K2O), we can convert the oxide into the wt% of sole cations (K), which allow us to build a mineral formula from the ground up. We find this using the process below:_ + +e.g: + +data --> Roubion (sample #1) has value of .75 for K2O + +K2O molar mass = 94.2 g/mol (where n = 2, since K has 2 moles present in K20) + +K molar mass = 39.1 g/mol + +oxide factor = 94.2 / (39.1 * n) + +element factor = 1 / (~1.2) = .833 + +x = .75 * element_factor = .625 K + +This allows us to see the proportion of potassium (K) wt% with respect to the sample given. This individual amount shows us the amount of potassium avaliable to us. + + + diff --git a/StudentNotebooks/Assignment04/walczd3-biweekly-10-08-2024.pdf b/StudentNotebooks/Assignment04/walczd3-biweekly-10-08-2024.pdf new file mode 100644 index 0000000..20d405a Binary files /dev/null and b/StudentNotebooks/Assignment04/walczd3-biweekly-10-08-2024.pdf differ diff --git a/StudentNotebooks/Assignment05/SupercamSamples.png b/StudentNotebooks/Assignment05/SupercamSamples.png new file mode 100644 index 0000000..78b7494 Binary files /dev/null and b/StudentNotebooks/Assignment05/SupercamSamples.png differ diff --git a/StudentNotebooks/Assignment05/TargetNames.png b/StudentNotebooks/Assignment05/TargetNames.png new file mode 100644 index 0000000..78bcb38 Binary files /dev/null and b/StudentNotebooks/Assignment05/TargetNames.png differ diff --git a/StudentNotebooks/Assignment05/mwatid-assignment05.Rmd b/StudentNotebooks/Assignment05/mwatid-assignment05.Rmd new file mode 100644 index 0000000..963d02e --- /dev/null +++ b/StudentNotebooks/Assignment05/mwatid-assignment05.Rmd @@ -0,0 +1,181 @@ +--- +title: "DAR F24 Project Status Notebook Template" +author: "Dante Mwatibo" +date: "`r Sys.Date()`" +output: + pdf_document: + toc: yes + html_document: + toc: yes +subtitle: "Mars" +--- +## Weekly Work Summary + +* RCS ID: mwatid +* Project Name: Mars +* Summary of work since last week + + * Describe the important aspects of what you worked on and accomplished + +* Summary of github issues added and worked + + * Canonical Component Analysis on PIXL #147d + +* Summary of github commits + + * branch: dar-mwatid + * commit links: + +* List of presentations, papers, or other outputs + + * Include browsable links + +* List of references (if necessary) +* Indicate any use of group shared code base +* Indicate which parts of your described work were done by you or as part of joint efforts + +## Personal Contribution + +* Clearly defined, unique contribution(s) done by you: code, ideas, writing... +* Include github issues you've addressed if any + +## Analysis: Canonical Correlation Analysis (PIXL) + +Is it possible to use canonical correlation analysis to determine whether or not there is any correlation in the amount of a subset of minerals in PIXL and the campaign of a rock sample. + +### Data Preparation + +The data I will be using for this analysis is a subset of minerals in the PIXL data. + +1.) Load in the PIXL and SHERLOC data +2.) Scale the PIXL data +3.) Decide the number of minerals I will use for the analysis (6 in this case) +4.) Get a random subset of the minerals in PIXL (3 determines the size of the sample) + + +```{r, result01_data} +# loading the proper libraries +library(ggplot2) +library(ggtern) +library(magrittr) +library(dbplyr) +library(tidyr) +library(CCA) + +# Load the saved PIXL data with locations added +pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") + +# Convert to factors +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], as.factor) + +# Make the matrix of just mineral percentage measurements +pixl.matrix <- pixl.df[,2:14] %>% scale() + +## LOADING IN THE SHERLOC DATA +# Read in data as provided. +sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds") + +# Clean up data types +sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral) +sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], + as.numeric) +# Transform NA's to 0 +sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0) + +# Reformat data so that rows are "abrasions" and columns list the presence of minerals. +# Do this by "pivoting" to a long format, and then back to the desired wide format. + +sherloc_long <- sherloc_abrasion_raw %>% + pivot_longer(!Mineral, names_to = "Name", values_to = "Presence") + +# Make abrasion a factor +sherloc_long$Name <- as.factor(sherloc_long$Name) + +# Make it a matrix +sherloc.matrix <- sherloc_long %>% + pivot_wider(names_from = Mineral, values_from = Presence) + +# Get sample information from PIXL and add to measurements -- assumes order is the same +sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix) + +### ----------------------- getting a random sample ------------------------ ### +ncols <- 6 +nrows <- 7 +colsample <- sample(ncol(pixl.matrix), ncols) +pixl.df.s <- cbind(pixl.df[,"campaign"], pixl.matrix[,colsample]) +``` + +### Analysis: Methods and results + +I'll be performing Canonical Correlation Analysis(CCA) using the subset of the data that I got before. The goal of this analysis is to see whether or not it is possible to predict the campaign of a sample based purely on the PIXL mineral data. In order to apply canonical correlation in this manner, I have the first grouping of fields be all the mineral fields available. Next, I have the second grouping of fields be solely the binary fields of campaign (Crater Floor or Delta Front). Next, perform CCA using these two groupings of the sample. Finally, create a plot showing the results of CCA when extrapolated to the entire PIXL dataset and see how close/far you get. + +_Provide clearly commented analysis code; include code for tables and figures!_ + +```{r, result01_analysis} +# Include all analysis code, clearly commented +# If not possible, screen shots are acceptable. +# If your contributions included things that are not done in an R-notebook, +# (e.g. researching, writing, and coding in Python), you still need to do +# this status notebook in R. Describe what you did here and put any products +# that you created in github. If you are writing online documents (e.g. overleaf +# or google docs), you can include links to the documents in this notebook +# instead of actual text. +cc.results <- cancor(pixl.df.s[,2:(ncols + 1)], (as.numeric(pixl.df.s[,1]) - 1), xcenter=FALSE, ycenter=FALSE) +mineral.cancor <- pixl.matrix[,colsample] %*% as.matrix(cc.results$xcoef[,1]) +CC1.minerals <- as.data.frame(cbind(mineral.cancor, as.matrix(pixl.df[,"campaign"]))) +ggplot(CC1.minerals) + + geom_point(aes(x=mineral.cancor, y=campaign)) +``` + +### Discussion of results + +The goal of this analysis was to see if I could take any rock sample and derive the campaign from said rock sample based on the value returned from the matrix multiplication of the rock sample's PIXL mineral data and the CCA results. If the value is above or below certain threshold you could reasonably assume the sample to be from a certain campaign. I hypothesized these results would be pretty mixed or at the very least semi-ambiguous, however CCA worked even better than I initially expected, creating a lot of separation between the two groups even when extrapolated to samples that canonical correlation was not created with, which surprised me. + + +## Analysis: Canonical Correlation Analysis (SHERLOC) + +### Question being asked + +Is it possible to use canonical correlation analysis to determine whether or not there is any correlation in the amount of a subset of minerals in SHERLOC and the campaign of a rock sample. + +### Data Preparation + +1.) Reuse the SHERLOC dataset that has been loaded in before +2.) Decide the number of elements I will use for the analysis (6 in this case) +3.) Get a random subset of the elements in SHERLOC (2 determines the size of the sample) + +```{r, result02_data} +ncols <- 6 +colsample <- sample(colnames(sherloc.matrix[,-1]), ncols) +sherloc.df.s <- cbind(sherloc.df[,"campaign"], sherloc.matrix[,colsample]) +``` + +### Analysis: Methods and Results + +I'll be performing Canonical Correlation Analysis(CCA) using the subset of the data that I got before. The goal of this analysis is to see whether or not it is possible to predict the campaign of a sample based purely on the SHERLOC elemental data. In order to apply canonical correlation in this manner, I have the first grouping of fields be all the elemental fields available. Next, I have the second grouping of fields be solely the binary fields of campaign (Crater Floor or Delta Front). Next, perform CCA using these two groupings of the sample. Finally, create a plot showing the results of CCA when extrapolated to the entire SHERLOC dataset and see how close/far you get. + +```{r, result02_analysis} +# Include all analysis code, clearly commented +# If not possible, screen shots are acceptable. +# If your contributions included things that are not done in an R-notebook, +# (e.g. researching, writing, and coding in Python), you still need to do +# this status notebook in R. Describe what you did here and put any products +# that you created in github (documents, jupytor notebooks, etc). If you are writing online documents (e.g. overleaf +# or google docs), you can include links to the documents in this notebook +# instead of actual text. +cc.s.results <- cancor(sherloc.df.s[,2:(ncols + 1)], (as.numeric(sherloc.df.s[,1]) - 1), xcenter=FALSE, ycenter=FALSE) +elements.cancor <- as.matrix(sherloc.matrix[,rownames(cc.s.results$xcoef)]) %*% as.matrix(cc.s.results$xcoef[,1]) +CC1.elements <- as.data.frame(cbind(elements.cancor, as.matrix(sherloc.df[,"campaign"]))) +colnames(CC1.elements) <- c("elements.cancor", "campaign") +ggplot(CC1.elements) + + geom_point(aes(x=elements.cancor, y=campaign)) +``` + +### Discussion of results + +While it appears it is possible to create a discriminatory line or value as a result of a CCA manipulation of the data, I'm of the opinion that these numbers are too small to draw any meaningful differentiating number. This is because if such a number were to be drawn in accordance with the graph above, it would be a decimal with over 15 0's between the decimal place and any non-zero number. This could be due to the large amount of 0s in the SHERLOC data, as it's possible that CCA does not perform well when there are lots of 0s in the data fed into it. It is also possible that by multiplying all the values by a factor of 10 the data would come out differently, and in fact there might be a larger raw difference between the values resulting from using the CCA results on the SHERLOC data if the SHERLOC data were to be manipulated beforehand like that. + +## Summary and next steps + +In summary it appears possible to use PIXL, at the very least, as an estimator for campaign utilizing CCA analysis in the way I did. Next, I would like to figure out an algorithm or some way to automate the process to optimize the fields CCA should take in to maximize separation between groups. + diff --git a/StudentNotebooks/Assignment05/mwatid-assignment05.pdf b/StudentNotebooks/Assignment05/mwatid-assignment05.pdf new file mode 100644 index 0000000..a9f9168 Binary files /dev/null and b/StudentNotebooks/Assignment05/mwatid-assignment05.pdf differ diff --git a/StudentNotebooks/Assignment05/vanesm-assignment5.Rmd b/StudentNotebooks/Assignment05/vanesm-assignment5.Rmd new file mode 100644 index 0000000..61ed04f --- /dev/null +++ b/StudentNotebooks/Assignment05/vanesm-assignment5.Rmd @@ -0,0 +1,463 @@ +--- +title: "DAR F24 Project Status Notebook Template" +author: "Margo VanEsselstyn" +date: "`r Sys.Date()`" +output: + pdf_document: + toc: yes + html_document: + toc: yes +subtitle: "DAR Mars" +--- +```{r setup, include=FALSE} +# Set the default CRAN repository +local({r <- getOption("repos") + r["CRAN"] <- "http://cran.r-project.org" + options(repos=r) +}) + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} + +if(!require("vegan")) { + install.packages("vegan") + library(vegan) +} + +if(!require("knitr")){ + install.packages("knitr") + library(knitr) +} + +if(!require("cluster")){ + install.packages("cluster") + library(cluster) +} + +if(!require("ggtern")){ + install.packages("ggtern") + library(ggtern) +} + +if(!require("caret")){ + install.packages("caret") + library(caret) +} + +if(!require("gridExtra")){ + install.packages("gridExtra") + library(gridExtra) +} + +if(!require("RColorBrewer")){ + install.packages("RColorBrewer") + library(RColorBrewer) +} + +knitr::opts_chunk$set(echo = TRUE) +``` + +## Weekly Work Summary + +* RCS ID: vanesm +* Project Name: Mars +* I updated the LIBS and PIXL combined data, categorized the LIBS data, and attempted to display the LIBS scct targets on a ternary diagram. +* I committed an updated version of the combined LIBS and PIXL data as well as a categorized LIBS data Rmd file + +## Analysis: Question 1 (LIBS Target Names) + +### Question being asked + +What can I learn about LIBS target names from the analysts notebook and other sources? + +### Data Preparation + +I am using the LIBS data, with some features removed. I am mainly focusing on the metadata. I added a new "type" column and am categorizing the targets based on a few different metrics. + +```{r, result01_data} +#load in LIBS data +libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") + +#Drop the standard deviation features, the sum of the percentages, +#the distance, and the total frequencies +libs.df <- libs.df %>% + select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) + +# Convert the points to numeric +libs.df$point <- as.numeric(libs.df$point) + +libsrenamed<-cbind(libs.df[,1:4],"type"=0,libs.df[5:13]) +``` + +I am adding a label to each scct target, based on descriptions from the file sent in the webex. Using the following reference table: + +![Supercam Onboard Sample Descriptions]("SupercamSamples.png") + +```{r, result01_data2} +libsrenamed<-libsrenamed %>% + mutate(type = ifelse(grepl("tsrich0404", target,ignore.case=T), + "BHVO-2 basalt and K sulfate mixture", type)) %>% + mutate(type = ifelse(grepl("LCMB0006", target,ignore.case=T), + "Chert", type)) %>% + mutate(type = ifelse(grepl("LCA530106", target,ignore.case=T), + "Calcite", type)) %>% + mutate(type = ifelse(grepl("PMIFS0505", target,ignore.case=T), + "Ferrosilite", type)) %>% + mutate(type = ifelse(grepl("TAPAG0206", target,ignore.case=T), + "Fluoro-Chloro-Hydro Apatite", type)) %>% + mutate(type = ifelse(grepl("PMIOR0507", target,ignore.case=T), + "Orthoclase", type)) %>% + mutate(type = ifelse(grepl("PMIDN0302", target,ignore.case=T), + "Diopside", type)) %>% + mutate(type = ifelse(grepl("PMIFA0306", target,ignore.case=T), + "Olivine", type)) %>% + mutate(type = ifelse(grepl("PMIAN0106", target,ignore.case=T), + "Andesine", type)) %>% + mutate(type = ifelse(grepl("PMIEN0602", target,ignore.case=T), + "Enstatite", type)) %>% + mutate(type = ifelse(grepl("TSERP0102", target,ignore.case=T), + "Serpentine/Talc", type)) %>% + mutate(type = ifelse(grepl("LBHVO20406", target,ignore.case=T), + "BHVO-2 standard basalt", type)) %>% + mutate(type = ifelse(grepl("LJSC10304", target,ignore.case=T), + "Mars soil analog", type)) %>% + mutate(type = ifelse(grepl("LANKE0101", target,ignore.case=T), + "Ankerite", type)) %>% + mutate(type = ifelse(grepl("LSIDE0101", target,ignore.case=T), + "Siderite", type)) %>% + mutate(type = ifelse(grepl("LJMN10106", target,ignore.case=T), + "JMN-1 standard Mn nodule", type)) %>% + mutate(type = ifelse(grepl("NTE010301", target,ignore.case=T), + "Basalt dopped in minor elements - Cu, Zn", type)) %>% + mutate(type = ifelse(grepl("NTE020106", target,ignore.case=T), + "Basalt dopped in minor elements - Mn, Ba, Cr", type)) %>% + mutate(type = ifelse(grepl("NTE030106", target,ignore.case=T), + "Basalt dopped in minor elements - Zn", type)) %>% + mutate(type = ifelse(grepl("NTE040106", target,ignore.case=T), + "Basalt dopped in minor elements - Li, Sr", type)) %>% + mutate(type = ifelse(grepl("NTE050301", target,ignore.case=T), + "Basalt dopped in minor elements - Ni", type)) %>% + mutate(type = ifelse(grepl("SHERG02", target,ignore.case=T), + "Shergottite", type)) %>% + mutate(type = ifelse(grepl("TITANIUM", target,ignore.case=T), + "Titanium", type)) +``` + +Based on [this article on nasa's website]( https://science.nasa.gov/blog/supercam-gains-new-artificial-intelligence-capabilities-with-aegis-upgrade/), the targets that contain the text "aegis" were taken by NASA's new AI that was implemented partway through perseverance's path, Autonomous Exploration for Gathering Increased Science. This allows the rover to take LIBS data autonomously. +There are two types of AEGIS, and I thought about trying to distinguish them AEGISlite/AEGISheavy, but it seems like the AEGISheavy update doesn't change the AEGIS LIBS functionality, it just allows for autonomous control of other Supercam measurements. As in my previous notebook, I just have these labeled with "AEGIS". + +```{r, result01_data3} +libsrenamed<-libsrenamed %>% + mutate(type= ifelse(grepl("aegis", target,ignore.case=T) & type=="0","AEGIS",type)) +``` + +When looking into what the suffix "scam" was referring to, (more specifically than just supercam), I decided to look at the analysts notebook documentation for sol 448, which was a sol that had a target name with "scam" in it. I found this chart, which indicates that they are named this way because a target for another measurement has the same name, and is in the same area. This could be a good way to potentially link LIBS and PIXL further. +![Description of Targets from Analysts Notebook Documentation, Sols 447-448]("TargetNames.png") + +I labeled the scam targets with the names of the other measurements taken at this target and SCAM. Note that the documentation for the target "montpezat_350_scam" is missing, so my label corresponding it to PIXL is just a guess. The documentation for the targets "pollock_knob_501_sca" and "villeplane_scam" are also missing, so their correlations with ZCAM are also guesses. Here, "AT-SCAM" means "All Techniques". + +```{r, result01_data4} +libsrenamed<-libsrenamed %>% + mutate(type= ifelse(grepl("buzzard_rocks_scam", target,ignore.case=T) & type=="0", + "PIXL-SCAM",type)) %>% + mutate(type= ifelse(grepl("alfalfa_378_scam", target,ignore.case=T) & type=="0", + "VISIR-Ramanx2-ZCAM-SCAM",type)) %>% + mutate(type= ifelse(grepl("chiniak_565_scam", target,ignore.case=T) & type=="0", + "AT-SCAM",type)) %>% + mutate(type= ifelse(grepl("garde_210_scam", target,ignore.case=T) & type=="0", + "AT-SCAM",type)) %>% + mutate(type= ifelse(grepl("guillaumes_168_scam", target,ignore.case=T) & type=="0", + "PIXL-SCAM",type)) %>% + mutate(type= ifelse(grepl("montpezat_350_scam", target,ignore.case=T) & type=="0", + "PIXL-SCAM",type)) %>% + mutate(type= ifelse(grepl("naltsos_scam", target,ignore.case=T) & type=="0", + "PIXL-SCAM",type)) %>% + mutate(type= ifelse(grepl("ouzel_falls_792_scam", target,ignore.case=T) & type=="0", + "AT-SCAM",type)) %>% + mutate(type= ifelse(grepl("pollock_knob_501_sca", target,ignore.case=T) & type=="0", + "ZCAM-SCAM",type)) %>% + mutate(type= ifelse(grepl("rose_river_falls_sca", target,ignore.case=T) & type=="0", + "?-SCAM",type)) %>% + mutate(type= ifelse(grepl("roubion_168_scam", target,ignore.case=T) & type=="0", + "ZCAM-PIXL-SCAM",type)) %>% + mutate(type= ifelse(grepl("villeplane_scam", target,ignore.case=T) & type=="0", + "ZCAM-SCAM",type)) %>% + mutate(type= ifelse(grepl("atmo_mountain_637_sc", target,ignore.case=T) & type=="0", + "ZCAMMS-SCAM",type)) %>% + mutate(type= ifelse(grepl("crosswind_lake_641_s", target,ignore.case=T) & type=="0", + "ZCAM-SCAM",type)) %>% + mutate(type=ifelse(type=="0","other",type)) +``` + +I started going through other targets to see if their names had importance, many of them have short descriptions but it seems like a lot of work to do manually right now, and maybe not the best use of time. Especially as many of the target names seem to be random based on these descriptions. + +```{r, result01_data5} +libsrenamed<-libsrenamed %>% + mutate(type= ifelse(target=="sei_________________", "other - fine soil",type)) %>% + mutate(type= ifelse(target=="naakih______________", "other - coarse soil",type)) +``` + +### Analysis/Discussion of Results + +For this portion of my notebook, I am not doing specific analysis, only trying to add more documentation to the data and give meaning to target names. + +For future analysis, I have some thoughts about how the newly categorized data should be analysed. Currently, LIBS analysts have been analysing all of the targets together, maybe taking subsets by cluster, but I think moving forward we should separate the data. + +It doesn't make sense to me to analyse the scct data in conjunction with the rest of the data, these targets are LIBS measurements of designated samples used to calibrate the machine. This data could be used for scaling or for understanding what samples are close to a certain reference, but they are not legitimate mars data, and I don't think they should be analysed as such. + +I am unsure whether the AEGIS data should be considered exactly the same as the rest of the LIBS data, functionally, they are both LIBS measurements of a rock, which indicates they should be the same. However, it seems like the AEGIS data is just taken when the rover has the time and capacity, so that time isn't wasted and the rover is always taking measurements, whereas the other LIBS samples are taken with more intention. + +The SCAM data should definitely be analysed the same way the rest of the LIBS data is, but it could also be used to link data sets. + +## Analysis: Question 2 (Plotting reference targets) + +### Question being asked + +How can we distinguish the scct targets on a ternary plot + +### Data Preparation + +I am using the LIBS data prepared earlier, and reformatting it for a ternary diagram. + +```{r, result02_data} +# Include all data processing code (if necessary), clearly commented +libs.matrix <- as.matrix(libs.df[,6:13]) + +libs.tern <- as.data.frame(libs.matrix) %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O,TiO2)) + + +libs.tern<-cbind(libs.tern, "type"=libsrenamed$type, "target"=libsrenamed$target, + "shape"=libsrenamed$type) + +libs.tern<-libs.tern %>% mutate(shape = ifelse(grepl("SCAM", type, ignore.case=T), + "other", shape)) %>% + mutate(shape = ifelse(grepl("other", type, ignore.case=T), + "other", shape)) %>% + mutate(shape = ifelse(grepl("scct", target, ignore.case=T), "scct", shape)) + +libs.tern$shape<-as.factor(libs.tern$shape) + + + +``` + + +```{r} +km<-kmeans(libs.tern[,1:3],5) + +libs.tern<-as.data.frame(cbind(libs.tern,"cluster"=as.factor(km$cluster))) +``` +### Analysis: Methods and Results + +I am trying to differentiate the scct points from the other points on a ternary diagram, but I struggled a lot with the best method. I would like to have them labeled with what kind of rock they are, but labels on the graph don't make sense as they block the whole graph and the factor has too many levels to differentiate by color or shape. +```{r} +libs.tern.other<-libs.tern[libs.tern$shape=="other",] +libs.tern.scct<-libs.tern[libs.tern$shape=="scct",] + +#libs.scct.avg<-libs.tern.scct[, lapply(.SD, average), by= target] +libs.scct.avg<-aggregate(cbind(x,y,z) ~ type, data = libs.tern.scct, FUN = "mean") +libs.tern.other<-libs.tern.other[,c(1,2,3,7)] +libs.tern.other<-cbind(libs.tern.other,"type"=0) +libs.scct.avg<-cbind(libs.scct.avg[,2:4],"cluster"=libs.scct.avg$type,"type"=1) +libs.tern<-rbind(libs.scct.avg,libs.tern.other) +``` + +```{r} +libs.tern<-cbind(libs.tern,"num"=rownames(libs.tern),"legend"=0) +``` + +```{r} +libs.tern<-libs.tern %>% + mutate(legend=paste(num,cluster,sep=" ")) +``` + +```{r, result02_analysis} +nv = -0.06 #Vertical Adjustment +pn = position_nudge_tern(y=nv,x=-nv/2,z=-nv/2) + +ggtern(libs.tern, ggtern::aes(x=x,y=y,z=z)) + + geom_point(data=subset(libs.tern,type==0),aes(color=cluster),alpha=0.5) + + geom_point(data=subset(libs.tern,type==1))+ + theme_rgbw() + + labs(title="LIBS ternary Plot With Reference Samples", + x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K") + + theme_nomask()+ + theme(legend.position="bottom")+ +geom_text(position=pn,data=subset(libs.tern,type==1),aes(label=num),check_overlap=T) + + +``` +```{r} +kablelibstern<-cbind(point=libs.tern$num,"Description"=libs.tern$cluster) +kablelibstern<-kablelibstern[1:22,] + +kable(kablelibstern) +``` +### Discussion of results + +I don't have clear results because my graph is not very clear and is not labeled well. One thing I found interesting was that the points at the same scct target don't necessarily have the same value, they are very close, but not exactly the same, so there is some amount of error. + +## Analysis: Question 3 (Improving LIBS-PIXL combined data) + +### Question being asked + +I am trying to improve the combined PIXL-LIBS data. + +### Data Preparation + +I am importing the data from the combined data Rds file in StudentData + +```{r, result03_data} +libs.pixl.combined <- readRDS("~/DAR-Mars-F24/StudentData/PIXL_LIBS_Combined.Rds") +``` + +### Analysis methods used + +Some of the matched LIBS and PIXL rows were matched with LIBS values that were scct targets, this means that our matching method earlier is not entirely correct. It makes sense that our method would miscategorize some of these scct targets to match PIXL data, because the rover is consistently recalibrating by taking LIBS measurements at these scct targets. + +```{r, result03_analysis} +libs.pixl.combined[991,] +``` + +I am removing the PIXL values that are matched with these scct targets + +```{r, result03_analysis2} +libs.pixl.combined<-libs.pixl.combined%>% + mutate(Long=ifelse(grepl("scct", target, ignore.case=T), NA, Long))%>% + mutate(sol.y=ifelse(grepl("scct", target, ignore.case=T), NA, sol.y))%>% + mutate(sample=ifelse(grepl("scct", target, ignore.case=T), NA, sample))%>% + mutate(name=ifelse(grepl("scct", target, ignore.case=T), NA, name))%>% + mutate(type=ifelse(grepl("scct", target, ignore.case=T), NA, type))%>% + mutate(campaign=ifelse(grepl("scct", target, ignore.case=T), NA, campaign))%>% + mutate(location=ifelse(grepl("scct", target, ignore.case=T), NA, location))%>% + mutate(abrasion=ifelse(grepl("scct", target, ignore.case=T), NA, abrasion)) +``` + +Now, we check which LIBS targets are specifically paired with a PIXL value, and if we successfully paired them + +```{r, result03_analysis3} +libs.pixl.combined<-cbind(libsrenamed$type,libs.pixl.combined) + +libs.pixl.combined[grepl("PIXL",libs.pixl.combined$`libsrenamed$type`),7:14] +``` +We did not pair these LIBS and PIXL targets, however, these targets are matched with PIXL targets that we don't have data for yet, so this makes sense. The only matched PIXL target we do have data for is the Roubion target, which doesn't make sense to match as it is an atmospheric sample. + +### Discussion of results + +Overall, our LIBS and PIXL matching is not perfect, and we need to continue to narrow it down, but it is somewhat better than I thought it was based on the two metrics above. + + +## Analysis: Question 4 (LIBS/PIXL Ternary Plot) + +### Question being asked + +_Provide in natural language a statement of what question you're trying to answer_ + +### Data Preparation + +_Provide in natural language a description of the data you are using for this analysis_ + +_Include a step-by-step description of how you prepare your data for analysis_ + +_If you're re-using dataframes prepared in another section, simply re-state what data you're using_ + +```{r, result04_data} +# Include all data processing code (if necessary), clearly commented +libsandpixl<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/PIXL_LIBS_Combined.Rds") +libsandpixl<-na.omit(libsandpixl) +libsandpixl<-cbind("index"=0,libsandpixl) +libsandpixl$index<-rownames(libsandpixl) +``` + + +```{r} +libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") + +#Drop the standard deviation features, the sum of the percentages, +#the distance, and the total frequencies +libs.df <- libs.df %>% + select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) + +libs.df<-cbind("index"=0,libs.df) +libs.df$index<-rownames(libs.df) +``` + + +```{r} +libs.df<-libs.df[libs.df$index %in% libsandpixl$index,] +``` + +### Analysis: Methods and results + +_Describe in natural language a statement of the analysis you're trying to do_ + +_Provide clearly commented analysis code; include code for tables and figures!_ + +```{r, result05_analysis} +libs.matrix <- as.matrix(libs.df[,7:14]) + +libs.tern <- as.data.frame(libs.matrix) %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O,TiO2)) + +``` + +```{r} +libs.tern<-cbind(libs.tern,"Abrasion"=as.factor(libsandpixl$abrasion)) +``` + + +```{r} +ggtern(libs.tern, ggtern::aes(x=x,y=y,z=z)) + + geom_point(data=libs.tern,aes(color=Abrasion,alpha=0.5)) + + theme_rgbw() + + labs(title="Mars LIBS Corresponding to PIXL", + x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K")+theme(legend.position="right") + + geom_point(data=libs.tern, + aes(color=Abrasion,alpha=0.5)) + + guides(alpha="none") +``` + + +### Discussion of results + +_Provide in natural language a clear discussion of your observations._ + + +## Summary and next steps + +I would like to continue to refine the PIXL and LIBS combined data, as well as continue learning about the naming conventions of the LIBS targets and how the LIBS data can be best broken up. + diff --git a/StudentNotebooks/StatusNotebookTemplate.html b/StudentNotebooks/Assignment05/vanesm-assignment5.html similarity index 70% rename from StudentNotebooks/StatusNotebookTemplate.html rename to StudentNotebooks/Assignment05/vanesm-assignment5.html index 6b09c55..ae09d24 100644 --- a/StudentNotebooks/StatusNotebookTemplate.html +++ b/StudentNotebooks/Assignment05/vanesm-assignment5.html @@ -9,9 +9,9 @@ - + - + DAR F24 Project Status Notebook Template @@ -352,252 +352,387 @@

DAR F24 Project Status Notebook Template

-

DAR Project Name Here (‘Mars’ or ‘DeFi’ or -‘CTBench’

-

Student Name

-

2024-09-18

+

DAR Mars

+

Margo VanEsselstyn

+

2024-10-23

-
-

Instructions (DELETE BEFORE SUBMISSION)

- -
    -
  1. Create a new copy of this notebook in the -AssignmentX sub-directory of your team’s github repository -using the following naming convention

    -
      -
    • rcsid_assignmentX.Rmd and -rcsid_assignmentX.pdf
    • -
    • For example, bennek_assignment03.Rmd
    • -
  2. -
  3. Document all the work you did on your assigned -project this week using the outline below.

  4. -
  5. You MUST include figures and/or tables to illustrate your work. -Screen shots are okay, but include something!

  6. -
  7. You MUST include links to other important resources (knitted HTMl -files, Shiny apps). See the guide below for help.

  8. -
  9. Commit the source (.Rmd) and knitted -(.html) versions of your notebook and push to -github

  10. -
  11. Submit a pull request. Please notify -Dr. Erickson if you don’t see your notebook merged within one -day.

  12. -
  13. DO NOT MERGE YOUR PULL REQUESTS -YOURSELF!!

  14. -
-

See the Grading Rubric for guidance on how the contents of this -notebook will be graded on LMS or GradeScope.

-

Weekly Work Summary

-

NOTE: Follow an outline format; use bullets to -express individual points.

- -
-
-

Personal Contribution

-
-

Analysis: Question 1 (Provide short name)

+
+

Analysis: Question 1 (LIBS Target Names)

Question being asked

-

Provide in natural language a statement of what question you’re -trying to answer

+

What can I learn about LIBS target names from the analysts notebook +and other sources?

Data Preparation

-

Provide in natural language a description of the data you are -using for this analysis

-

Include a step-by-step description of how you prepare your data -for analysis

-

If you’re re-using dataframes prepared in another section, simply -re-state what data you’re using

-
# Include all data processing code (if necessary), clearly commented
+

I am using the LIBS data, with some features removed. I am mainly +focusing on the metadata. I added a new “type” column and am +categorizing the targets based on a few different metrics.

+
#load in LIBS data
+libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
+
+#Drop the standard deviation features, the sum of the percentages, 
+#the distance, and the total frequencies
+libs.df <- libs.df %>% 
+  select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+
+# Convert the points to numeric
+libs.df$point <- as.numeric(libs.df$point)
+
+libsrenamed<-cbind(libs.df[,1:4],"type"=0,libs.df[5:13])
+

I am adding a label to each scct target, based on descriptions from +the file sent in the webex. Using the following reference table:

+
+Supercam Onboard Sample Descriptions +
Supercam Onboard Sample Descriptions
-
-

Analysis: Methods and results

-

Describe in natural language a statement of the analysis you’re -trying to do

-

Provide clearly commented analysis code; include code for tables -and figures!

-
# Include all analysis code, clearly commented
-# If not possible, screen shots are acceptable. 
-# If your contributions included things that are not done in an R-notebook, 
-#   (e.g. researching, writing, and coding in Python), you still need to do 
-#   this status notebook in R.  Describe what you did here and put any products 
-#   that you created in github. If you are writing online documents (e.g. overleaf 
-#   or google docs), you can include links to the documents in this notebook 
-#   instead of actual text.
+
libsrenamed<-libsrenamed %>% 
+  mutate(type = ifelse(grepl("tsrich0404", target,ignore.case=T),
+                       "BHVO-2 basalt and K sulfate mixture", type)) %>%
+  mutate(type = ifelse(grepl("LCMB0006", target,ignore.case=T),
+                       "Chert", type)) %>%
+  mutate(type = ifelse(grepl("LCA530106", target,ignore.case=T),
+                       "Calcite", type)) %>%
+  mutate(type = ifelse(grepl("PMIFS0505", target,ignore.case=T),
+                       "Ferrosilite", type)) %>%
+  mutate(type = ifelse(grepl("TAPAG0206", target,ignore.case=T),
+                       "Fluoro-Chloro-Hydro Apatite", type)) %>%
+  mutate(type = ifelse(grepl("PMIOR0507", target,ignore.case=T),
+                       "Orthoclase", type)) %>%
+  mutate(type = ifelse(grepl("PMIDN0302", target,ignore.case=T),
+                       "Diopside", type)) %>%
+  mutate(type = ifelse(grepl("PMIFA0306", target,ignore.case=T),
+                       "Olivine", type)) %>%
+  mutate(type = ifelse(grepl("PMIAN0106", target,ignore.case=T),
+                       "Andesine", type)) %>%
+  mutate(type = ifelse(grepl("PMIEN0602", target,ignore.case=T),
+                       "Enstatite", type)) %>%
+  mutate(type = ifelse(grepl("TSERP0102", target,ignore.case=T),
+                       "Serpentine/Talc", type)) %>%
+  mutate(type = ifelse(grepl("LBHVO20406", target,ignore.case=T),
+                       "BHVO-2 standard basalt", type)) %>%
+  mutate(type = ifelse(grepl("LJSC10304", target,ignore.case=T),
+                       "Mars soil analog", type)) %>%
+  mutate(type = ifelse(grepl("LANKE0101", target,ignore.case=T),
+                       "Ankerite", type)) %>%
+  mutate(type = ifelse(grepl("LSIDE0101", target,ignore.case=T),
+                       "Siderite", type)) %>%
+  mutate(type = ifelse(grepl("LJMN10106", target,ignore.case=T),
+                       "JMN-1 standard Mn nodule", type)) %>%
+  mutate(type = ifelse(grepl("NTE010301", target,ignore.case=T),
+                       "Basalt dopped in minor elements - Cu, Zn", type)) %>%
+  mutate(type = ifelse(grepl("NTE020106", target,ignore.case=T),
+                       "Basalt dopped in minor elements - Mn, Ba, Cr", type)) %>%
+  mutate(type = ifelse(grepl("NTE030106", target,ignore.case=T),
+                       "Basalt dopped in minor elements - Zn", type)) %>%
+  mutate(type = ifelse(grepl("NTE040106", target,ignore.case=T),
+                       "Basalt dopped in minor elements - Li, Sr", type)) %>%
+  mutate(type = ifelse(grepl("NTE050301", target,ignore.case=T),
+                       "Basalt dopped in minor elements - Ni", type)) %>%
+  mutate(type = ifelse(grepl("SHERG02", target,ignore.case=T),
+                       "Shergottite", type)) %>%
+  mutate(type = ifelse(grepl("TITANIUM", target,ignore.case=T),
+                       "Titanium", type))
+

Based on this +article on nasa’s website, the targets that contain the text “aegis” +were taken by NASA’s new AI that was implemented partway through +perseverance’s path, Autonomous Exploration for Gathering Increased +Science. This allows the rover to take LIBS data autonomously. There are +two types of AEGIS, and I thought about trying to distinguish them +AEGISlite/AEGISheavy, but it seems like the AEGISheavy update doesn’t +change the AEGIS LIBS functionality, it just allows for autonomous +control of other Supercam measurements. As in my previous notebook, I +just have these labeled with “AEGIS”.

+
libsrenamed<-libsrenamed %>% 
+  mutate(type= ifelse(grepl("aegis", target,ignore.case=T) & type=="0","AEGIS",type))
+

When looking into what the suffix “scam” was referring to, (more +specifically than just supercam), I decided to look at the analysts +notebook documentation for sol 448, which was a sol that had a target +name with “scam” in it. I found this chart, which indicates that they +are named this way because a target for another measurement has the same +name, and is in the same area. This could be a good way to potentially +link LIBS and PIXL further. Description of Targets from Analysts Notebook Documentation, Sols 447-448

+

I labeled the scam targets with the names of the other measurements +taken at this target and SCAM. Note that the documentation for the +target “montpezat_350_scam” is missing, so my label corresponding it to +PIXL is just a guess. The documentation for the targets +“pollock_knob_501_sca” and “villeplane_scam” are also missing, so their +correlations with ZCAM are also guesses. Here, “AT-SCAM” means “All +Techniques”.

+
libsrenamed<-libsrenamed %>% 
+  mutate(type= ifelse(grepl("buzzard_rocks_scam", target,ignore.case=T) & type=="0",
+                      "PIXL-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("alfalfa_378_scam", target,ignore.case=T) & type=="0",
+                      "VISIR-Ramanx2-ZCAM-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("chiniak_565_scam", target,ignore.case=T) & type=="0",
+                      "AT-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("garde_210_scam", target,ignore.case=T) & type=="0",
+                      "AT-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("guillaumes_168_scam", target,ignore.case=T) & type=="0",
+                      "PIXL-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("montpezat_350_scam", target,ignore.case=T) & type=="0",
+                      "PIXL-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("naltsos_scam", target,ignore.case=T) & type=="0",
+                      "PIXL-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("ouzel_falls_792_scam", target,ignore.case=T) & type=="0",
+                      "AT-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("pollock_knob_501_sca", target,ignore.case=T) & type=="0",
+                      "ZCAM-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("rose_river_falls_sca", target,ignore.case=T) & type=="0",
+                      "?-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("roubion_168_scam", target,ignore.case=T) & type=="0",
+                      "ZCAM-PIXL-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("villeplane_scam", target,ignore.case=T) & type=="0",
+                      "ZCAM-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("atmo_mountain_637_sc", target,ignore.case=T) & type=="0",
+                      "ZCAMMS-SCAM",type)) %>%
+  mutate(type= ifelse(grepl("crosswind_lake_641_s", target,ignore.case=T) & type=="0",
+                      "ZCAM-SCAM",type)) %>%
+  mutate(type=ifelse(type=="0","other",type))
+

I started going through other targets to see if their names had +importance, many of them have short descriptions but it seems like a lot +of work to do manually right now, and maybe not the best use of time. +Especially as many of the target names seem to be random based on these +descriptions.

+
libsrenamed<-libsrenamed %>% 
+  mutate(type= ifelse(target=="sei_________________", "other - fine soil",type)) %>%
+  mutate(type= ifelse(target=="naakih______________", "other - coarse soil",type))
-
-

Discussion of results

-

Provide in natural language a clear discussion of your -observations.

+
+

Analysis/Discussion of Results

+

For this portion of my notebook, I am not doing specific analysis, +only trying to add more documentation to the data and give meaning to +target names.

+

For future analysis, I have some thoughts about how the newly +categorized data should be analysed. Currently, LIBS analysts have been +analysing all of the targets together, maybe taking subsets by cluster, +but I think moving forward we should separate the data.

+

It doesn’t make sense to me to analyse the scct data in conjunction +with the rest of the data, these targets are LIBS measurements of +designated samples used to calibrate the machine. This data could be +used for scaling or for understanding what samples are close to a +certain reference, but they are not legitimate mars data, and I don’t +think they should be analysed as such.

+

I am unsure whether the AEGIS data should be considered exactly the +same as the rest of the LIBS data, functionally, they are both LIBS +measurements of a rock, which indicates they should be the same. +However, it seems like the AEGIS data is just taken when the rover has +the time and capacity, so that time isn’t wasted and the rover is always +taking measurements, whereas the other LIBS samples are taken with more +intention.

+

The SCAM data should definitely be analysed the same way the rest of +the LIBS data is, but it could also be used to link data sets.

-
-

Analysis: Question 2 (Provide short name)

+
+

Analysis: Question 2 (Plotting reference targets)

Question being asked

-

Provide in natural language a statement of what question you’re -trying to answer

+

How can we distinguish the scct targets on a ternary plot

Data Preparation

-

Provide in natural language a description of the data you are -using for this analysis

-

Include a step-by-step description of how you prepare your data -for analysis

-

If you’re re-using dataframes prepared in another section, simply -re-state what data you’re using

-
# Include all data processing code (if necessary), clearly commented
+

I am using the LIBS data prepared earlier, and reformatting it for a +ternary diagram.

+
# Include all data processing code (if necessary), clearly commented
+libs.matrix <- as.matrix(libs.df[,6:13]) 
+
+libs.tern <- as.data.frame(libs.matrix) %>%
+  mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>%
+  select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O,TiO2))
+
+
+libs.tern<-cbind(libs.tern, "type"=libsrenamed$type, "target"=libsrenamed$target, 
+                 "shape"=libsrenamed$type)
+
+libs.tern<-libs.tern %>% mutate(shape = ifelse(grepl("SCAM", type, ignore.case=T),
+                       "other", shape)) %>%
+  mutate(shape = ifelse(grepl("other", type, ignore.case=T),
+                       "other", shape)) %>%
+  mutate(shape = ifelse(grepl("scct", target, ignore.case=T), "scct", shape))
+
+libs.tern$shape<-as.factor(libs.tern$shape)
+
km<-kmeans(libs.tern[,1:3],5)
+
+libs.tern<-as.data.frame(cbind(libs.tern,"cluster"=as.factor(km$cluster)))
-
+

Analysis: Methods and Results

-

Describe in natural language a statement of the analysis you’re -trying to do

-

Provide clearly commented analysis code; include code for tables -and figures!

-
# Include all analysis code, clearly commented
-# If not possible, screen shots are acceptable. 
-# If your contributions included things that are not done in an R-notebook, 
-#   (e.g. researching, writing, and coding in Python), you still need to do 
-#   this status notebook in R.  Describe what you did here and put any products 
-#   that you created in github (documents, jupytor notebooks, etc). If you are writing online documents (e.g. overleaf 
-#   or google docs), you can include links to the documents in this notebook 
-#   instead of actual text.
+

I am trying to differentiate the scct points from the other points on +a ternart diagram, but I struggled a lot with the best method. I would +like to have them labeled with what kind of rock they are, but labels on +the graph don’t make sense as they block the whole graph and the factor +has too many levels to differentiate by color or shape.

+
ggtern(libs.tern, ggtern::aes(x=x,y=y,z=z)) +
+  geom_point(data=subset(libs.tern,shape!="scct"),aes(color=cluster,alpha=0.5)) + 
+  theme_rgbw() + 
+  labs(title="Mars LIBS ternary Plot",
+       x="Si+Al",
+       y="Fe+Mg",
+       z="Ca+Na+K")+theme(legend.position="bottom") + 
+  geom_point(data=subset(libs.tern, shape=="scct"),
+            aes(color="scct",alpha=0.5)) +
+  guides(alpha="none")
+

-
+

Discussion of results

-

Provide in natural language a clear discussion of your -observations.

+

I don’t have clear results because my graph is not very clear and is +not labeled well. One thing I found interesting was that the points at +the same scct target don’t necessarily have the same value, they are +very close, but not exactly the same, so there is some amount of +error.

-
-

Analysis: Question 3 (Provide short name)

+
+

Analysis: Question 3 (Improving LIBS-PIXL combined data)

Question being asked

-

Provide in natural language a statement of what question you’re -trying to answer

+

I am trying to improve the combined PIXL-LIBS data.

Data Preparation

-

Provide in natural language a description of the data you are -using for this analysis

-

Include a step-by-step description of how you prepare your data -for analysis

-

If you’re re-using dataframes prepared in another section, -re-state what data you’re using

-
# Include all data processing code (if necessary), clearly commented
+

I am importing the data from the combined data Rds file in +StudentData

+
libs.pixl.combined <- readRDS("~/DAR-Mars-F24/StudentData/PIXL_LIBS_Combined.Rds")

Analysis methods used

-

Describe in natural language a statement of the analysis you’re -trying to do

-

Provide clearly commented analysis code; include code for tables -and figures!

-
# Include all analysis code, clearly commented
-# If not possible, screen shots are acceptable. 
-# If your contributions included things that are not done in an R-notebook, 
-#   (e.g. researching, writing, and coding in Python), you still need to do 
-#   this status notebook in R.  Describe what you did here and put any products 
-#   that you created in github. If you are writing online documents (e.g. overleaf 
-#   or google docs), you can include links to the documents in this notebook 
-#   instead of actual text.
+

Some of the matched LIBS and PIXL rows were matched with LIBS values +that were scct targets, this means that our matching method earlier is +not entirely correct. It makes sense that our method would miscategorize +some of these scct targets to match PIXL data, because the rover is +consistently recalibrating by taking LIBS measurements at these scct +targets.

+
libs.pixl.combined[991,]
+
##     sol.x    lat      lon               target point Long sol.y sample name
+## 991   592 18.451 77.40133 scct_ljsc10304______    19 <NA>    NA     NA <NA>
+##     type campaign location abrasion
+## 991 <NA>     <NA>     <NA>     <NA>
+

I am removing the PIXL values that are matched with these scct +targets

+
libs.pixl.combined<-libs.pixl.combined%>%
+  mutate(Long=ifelse(grepl("scct", target, ignore.case=T), NA, Long))%>%
+  mutate(sol.y=ifelse(grepl("scct", target, ignore.case=T), NA, sol.y))%>%
+  mutate(sample=ifelse(grepl("scct", target, ignore.case=T), NA, sample))%>%
+  mutate(name=ifelse(grepl("scct", target, ignore.case=T), NA, name))%>%
+  mutate(type=ifelse(grepl("scct", target, ignore.case=T), NA, type))%>%
+  mutate(campaign=ifelse(grepl("scct", target, ignore.case=T), NA, campaign))%>%
+  mutate(location=ifelse(grepl("scct", target, ignore.case=T), NA, location))%>%
+  mutate(abrasion=ifelse(grepl("scct", target, ignore.case=T), NA, abrasion))
+

Now, we check which LIBS targets are specifically paired with a PIXL +value, and if we successfully paired them

+
libs.pixl.combined<-cbind(libsrenamed$type,libs.pixl.combined)
+
+libs.pixl.combined[grepl("PIXL",libs.pixl.combined$`libsrenamed$type`),7:14]
+
##     Long sol.y sample name type campaign location abrasion
+## 197 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 198 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 199 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 200 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 201 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 275 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 276 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 277 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 278 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 279 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 280 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 281 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 282 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 283 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 284 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 285 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 286 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 287 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 576 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 577 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 578 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 579 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 580 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 581 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 582 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 583 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 584 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 585 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 711 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 712 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 713 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 714 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 715 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 716 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 717 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 718 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 719 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+## 720 <NA>    NA     NA <NA> <NA>     <NA>     <NA>     <NA>
+

We did not pair these LIBS and PIXL targets, however, these targets +are matched with PIXL targets that we don’t have data for yet, so this +makes sense. The only matched PIXL target we do have data for is the +Roubion target, which doesn’t make sense to match as it is an +atmospheric sample.

-
+

Discussion of results

-

Provide in natural language a clear discussion of your -observations.

+

Overall, our LIBS and PIXL matching is not perfect, and we need to +continue to narrow it down, but it is somewhat better than I thought it +was based on the two metrics above.

Summary and next steps

-

Provide in natural language a clear summary and your proposed -next steps.

+

I would like to continue to refine the PIXL and LIBS combined data, +as well as continue learning about the naming conventions of the LIBS +targets and how the LIBS data can be best broken up.

diff --git a/StudentNotebooks/Assignment05/vanesm-assignment5.pdf b/StudentNotebooks/Assignment05/vanesm-assignment5.pdf new file mode 100644 index 0000000..9b2a2ff Binary files /dev/null and b/StudentNotebooks/Assignment05/vanesm-assignment5.pdf differ diff --git a/StudentNotebooks/StatusNotebookTemplate.Rmd b/StudentNotebooks/StatusNotebookTemplate.Rmd index e92f794..3b582c6 100644 --- a/StudentNotebooks/StatusNotebookTemplate.Rmd +++ b/StudentNotebooks/StatusNotebookTemplate.Rmd @@ -7,7 +7,7 @@ output: toc: yes pdf_document: toc: yes -subtitle: "DAR Project Name Here ('Mars' or 'DeFi' or 'CTBench'" +subtitle: "DAR Project Name Here: 'Mars' or 'DeFi' or 'CTBench'" --- ## Instructions (DELETE BEFORE SUBMISSION) @@ -52,13 +52,13 @@ See the Grading Rubric for guidance on how the contents of this notebook will be * Summary of github commits - * include branch name(s) - * include browsable links to all external files on github + * Include branch name(s) + * Include filenames for any added or changed files on github * Include links to shared Shiny apps * List of presentations, papers, or other outputs - * Include browsable links + * Include browsable links (ie Google Slides, et.al.) * List of references (if necessary) * Indicate any use of group shared code base