diff --git a/StudentData/LIBS_calibration_targets.Rds b/StudentData/LIBS_calibration_targets.Rds new file mode 100644 index 0000000..df90460 Binary files /dev/null and b/StudentData/LIBS_calibration_targets.Rds differ diff --git a/StudentData/PIXL_LIBS_Combined.Rds b/StudentData/PIXL_LIBS_Combined.Rds index 753cb0b..17bdf04 100644 Binary files a/StudentData/PIXL_LIBS_Combined.Rds and b/StudentData/PIXL_LIBS_Combined.Rds differ diff --git a/StudentData/README.md b/StudentData/README.md index 567a233..aadeef6 100644 --- a/StudentData/README.md +++ b/StudentData/README.md @@ -16,14 +16,16 @@ _You may need to use a more creative path name to read an Rds in the StudentData # Rds file introductions -**pixl_sol_coordinates.Rds** has all of the data in samples_pixl_wide.Rds with the latitude, longitude and sol found from the analysts notebook. Note that the Salette and Bearwallow samples have zeros as their latitude and longitude because the analysts notebook was giving errors for these sites. Following the conventions of this data set, though, most likely Salette has the same coordinates as Coulettes, and Bearwallow has the same coordinates as Hazeltop. +**pixl_sol_coordinates.Rds** has all of the data in samples_pixl_wide.Rds with the latitude, longitude and sol found from the analysts notebook. -**libs_v2** all the libs data with the columns renamed (meta data capitalized) so that they match with the other datasets and reordered to match with other data sets. +**libs_v1** all the libs data with the columns renamed (meta data capitalized) so that they match with the other datasets and reordered to match with other data sets. -**pixl_v2** all the pixl data with the columns renamed and reordered to match libs. +**pixl_v1** all the pixl data with the columns renamed and reordered to match libs. -**sherloc_v2** all the sherloc data, but after it's been turned into a data frame in the same format as sherloc. +**sherloc_v1** all the sherloc data, but after it's been turned into a data frame in the same format as sherloc. -**lithology_v2** all the lithology data with the columns renamed and reordered to match sherloc. +**lithology_v1** all the lithology data with the columns renamed and reordered to match sherloc. **sample_meta** all the meta data for the samples. This can be appended to pixl, sherloc, and lithology. + +**libs_typed.Rds** the libs data with a "type" column added that contains descriptors of the scct samples, as well as other sample descriptors from the analysts notebook diff --git a/StudentData/libs_typed.Rds b/StudentData/libs_typed.Rds new file mode 100644 index 0000000..eb09ee2 Binary files /dev/null and b/StudentData/libs_typed.Rds differ diff --git a/StudentData/v1_Data_Introduction.md b/StudentData/v1_Data_Introduction.md index 1f0acec..757ffab 100644 --- a/StudentData/v1_Data_Introduction.md +++ b/StudentData/v1_Data_Introduction.md @@ -2,18 +2,18 @@ All data sets are reordered such that meta data is capitalized and at the front of the data frame and ____ data is at the end. Additionally, the order and names of elemental compound in PIXL/LIBS and minerals in Lithology/Sherloc have been made to match. -# Libs -There is both meta data and feature data included in Libs, since their are no other data sets that use match the targets and thus no point in seperating it out. +# LIBS +There is both meta data and feature data included in LIBS, since their are no other data sets that use match the targets and thus no point in separating it out. **Meta data**: -- *Target*: Factor, name of location the libs was taken. -- *Point*: Numeric, intergers 1 through 28. The supercam is a semi-grid (28 dots, or "points", in rows of 6,5,6,5,6), and so for each target there are 28 "points" taken. -- *Sol*: Numeric, integers > 0. The Mars day (since start of mission) that the rover took the libs point was taken. -- *Lat*: Numeric. Part of the location data. Where the *rover* was when the libs point was taken. -- *Lon*: Numeric. Part of the location data. Where the *rover* was when the libs point was taken. +- *Target*: Character, name of location the LIBS was taken. +- *Point*: Factor, intergers 1 through 28. The supercam is a semi-grid (28 dots, or "points", in rows of 6,5,6,5,6), and so for each target there are 28 "points" taken. +- *Sol*: Numeric, integers > 0. The Mars day (since start of mission) that the rover took the LIBS point was taken. +- *Lat*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken. +- *Lon*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken. -**Feature data** is the same as PIXL, concentration of elemental compounds, though without a few of the elemental compounds PIXL includes. +**Feature data** (numeric) is the same as PIXL, concentration of elemental compounds, though without a few of the elemental compounds PIXL includes. Main change made to the LIBS data set is that the elemental compounds originally had "o" (lowercase letter o) to represent "Oxygen" has been changed to "O" (uppercase letter O). @@ -23,7 +23,9 @@ Sample data is the data for the, currently 16, samples. There are four data fram ## Sample Meta A data frame containing all the meta data for the samples. This can be appended to Pixl, Lithology, or Sherloc based on "Sample". -**Sample**: Numeric, integers 1 through 24. The sample #, 1 through 16. Note that this sample number does *not* match the sample numbers in the analyst notebook and initial reports, because they also count witness blank (control) samples in their count (ex: our "16" is their "18" because they have two witness blank samples before it). +**Sample**: Integer, 1 through 24. The sample #, 1 through 16. Note that this sample number does *not* match the sample numbers in the analyst notebook and initial reports, because they also count witness blank (control) samples in their count (ex: our "16" is their "18" because they have two witness blank samples before it). + +**Name**: Character, unique. Just what they decided to refer to the sample as. Also used in analyst notebook and initial reports. **Sol**: Numeric, integers > 0. The Mars day (since start of mission) that the sample was taken @@ -31,8 +33,6 @@ A data frame containing all the meta data for the samples. This can be appended **Lon**: Numeric. Part of the location data. Where the smaple was taken. -**Name**: Factor, unique. Just what they decided to refer to the sample as. Also used in analyst notebook and initial reports. - **Abrasion**: Factor, some duplication. This is the name they give to the bit of rock they abraided, used to indicate when two samples came from the same spot. @@ -48,11 +48,24 @@ Starts with the sample number. Following is all numeric elemental compound conce One change made to the PIXL data set is that the elemental compounds originally had "0" (zero) to represent "Oxygen" has been changed to "O" (uppercase letter O). ## Lithology -Starts with the sample number. Following columns are all binary data (0 or 1) representing the presence or absence of minerals. +Starts with the sample number. Following columns are all factor data (binary 0 or 1) representing the presence or absence of minerals. Main change to Lithology data set was the renaming and ordering of the minerals to match the sherloc data set. ## Sherloc -Starts with the sample number. Following columns are all "numeric" data (0 through 1) representing the certainty in the presence of minerals. This is discrete data though, with the only options being "0" (No detection), "0.25" (Possible but not confirmed), "0.5" (Almost certain, working on confirming), "0.75" (?), and "1" (confirmed). +Starts with the sample number. Following columns are all factor data representing the certainty in the presence of minerals. This is discrete data though, with the only options being "0" (No detection), "0.25" (Possible but not confirmed), "0.5" (Almost certain, working on confirming), "0.75" (?), and "1" (confirmed). Main change made to the SHERLOC data set is that in v1 it has already been converted into the same data frame format as the other data sets. + +# LIBS to PIXL data set +A data set connecting libs points and pixl samples based on lat/lon. + +Appended ".libs" to end of the LIBS meta data column names and ".pixl" to end of the PIXL meta data column names so that it is, at a glance, transparent which refers to which. + +Contains: + +**Target.libs**, **Point.libs**, **Sol.libs**, **Lat.libs**, **Lon.libs**: Exactly the same as the ones from LIBS + +**Sample.pixl**, **Sol.pixl**, **Lon.pixl**, **Abrasion.pixl**, **Campaign.pixl**, **Type.pixl**: "NA" if there is not a correlated pixl sample for the LIBS point. If a pixl Sample *does* correlate (based on lat/lon), then it's the same as the data in v1_sample_meta.Rds for that sample number. + +Note that "Lat.pixl" is missing. This is not intentional, and should hopefully be fixed soon. \ No newline at end of file diff --git a/StudentData/v1_consistent_data_naming.Rmd b/StudentData/v1_consistent_data_naming.Rmd index ad329fa..bb295b2 100644 --- a/StudentData/v1_consistent_data_naming.Rmd +++ b/StudentData/v1_consistent_data_naming.Rmd @@ -40,6 +40,9 @@ sherloc.df$Name <- as.factor(sherloc.df$Name) ## Make it a matrix sherloc.matrix <- sherloc.df %>% pivot_wider(names_from = Mineral, values_from = Presence) sherloc.df <- cbind(sherloc.matrix,pixl.df[,"sample"]) + +# pixl and libs combined data frame +pixl_libs.df <- readRDS("PIXL_LIBS_Combined.Rds") ``` # Renaming Columns @@ -72,6 +75,9 @@ colnames(sherloc.df) <- c("Name", "Organic matter","Sulfate+Organic matter","Other hydrated phases","Phyllosilicates", "Chlorite","Kaolinite (hydrous Al-clay)","Chromite","Ilmenite", "Zircon/Baddeleyite","Fe-Mg-clay minerals","Spinels","Sample") + +# Renaming Pixl and Libs combined data set +colnames(pixl_libs.df) <- c("Sol.libs","Lat.libs","Lon.libs","Target.libs","Point.libs","Lon.pixl","Sol.pixl","Sample.pixl","Name.pixl","Type.pixl","Campaign.pixl","Location.pixl","Abrasion.pixl") ``` # Creating Sample metadata data frame @@ -80,7 +86,7 @@ colnames(sherloc.df) <- c("Name", sample_meta.df <- qpcR:::cbind.na(pixl.df[,c("Sol","Lat","Lon","Type","Campaign","Abrasion","Name","Location")], lithology.df[,c("Sample","SampleType")]) # Reordering it -sample_meta.df <- sample_meta.df[,c("Sample","Sol","Lat","Lon","Name","Abrasion","Campaign","Type","SampleType")] +sample_meta.df <- sample_meta.df[,c("Sample","Name","Sol","Lat","Lon","Abrasion","Campaign","Type","SampleType")] # Changing atmospherics type from "N/A" to "Atmospheric" sample_meta.df[1,"Type"] <- "Atmospheric" @@ -119,8 +125,48 @@ sherloc.df <- sherloc.df[,c("Sample", "Organic matter","Sulfate+Organic matter","Other hydrated phases","Phyllosilicates", "Chlorite","Kaolinite (hydrous Al-clay)","Chromite","Ilmenite", "Zircon/Baddeleyite","Fe-Mg-clay minerals","Spinels")] + +# Resorting Pixl and Libs combined data set +pixl_libs.df <- pixl_libs.df[,c("Target.libs","Point.libs","Sol.libs","Lat.libs","Lon.libs", + "Sample.pixl","Name.pixl","Sol.pixl","Lon.pixl","Abrasion.pixl","Campaign.pixl","Type.pixl")] + +``` + +Check types and fix them (ex Sample, Sol, Lat, Lon -> numeric, Name -> character, Abrasion, Campaign, Type, SampleType -> Factor) +# Fixing data types +```{r} +# Pixl +## Already good! +## Sample is integer and concentrations are numeric! + +# Libs +libs.df$Point <- as.factor(libs.df$Point) # Was originally "character" + +# Lithology +lithology.df[,2:36] <- lapply(lithology.df[,2:36],as.factor) # Was originally "character" +lithology.df$Sample <- as.integer(lithology.df$Sample) #To match Pixl + +# Sherloc +sherloc.df[] <- data.frame(lapply(sherloc.df[],as.factor)) # Was originally "character" +sherloc.df$Sample <- as.integer(sherloc.df$Sample) # Back to original, since prior line changed it + +# Sample Meta +sample_meta.df$Sample <- as.integer(sample_meta.df$Sample) +# sample_meta.df$Name <- as.character(sample_meta.df$Name) # Already in the format! +sample_meta.df$Sol <- as.numeric(sample_meta.df$Sol) +sample_meta.df$Lat <- as.numeric(sample_meta.df$Lat) +sample_meta.df$Lon <- as.numeric(sample_meta.df$Lon) +sample_meta.df$Abrasion <- as.factor(sample_meta.df$Abrasion) +sample_meta.df$Campaign <- as.factor(sample_meta.df$Campaign) +sample_meta.df$Type <- as.factor(sample_meta.df$Type) +sample_meta.df$SampleType <- as.factor(sample_meta.df$SampleType) + +# Pixl and Libs combined +## Already good! + ``` + # Saving New data frames ```{r} saveRDS(sample_meta.df, "v1_sample_meta.Rds") @@ -128,4 +174,5 @@ saveRDS(libs.df, "v1_libs.Rds") saveRDS(lithology.df, "v1_lithology.Rds") saveRDS(sherloc.df, "v1_sherloc.Rds") saveRDS(pixl.df, "v1_pixl.Rds") +saveRDS(pixl_libs.df, "v1_libs_to_sample.Rds") ``` \ No newline at end of file diff --git a/StudentData/v1_libs.Rds b/StudentData/v1_libs.Rds index c90d68b..64657a5 100644 Binary files a/StudentData/v1_libs.Rds and b/StudentData/v1_libs.Rds differ diff --git a/StudentData/v1_libs_to_sample.Rds b/StudentData/v1_libs_to_sample.Rds new file mode 100644 index 0000000..711ce9d Binary files /dev/null and b/StudentData/v1_libs_to_sample.Rds differ diff --git a/StudentData/v1_lithology.Rds b/StudentData/v1_lithology.Rds index dd22b91..04f1928 100644 Binary files a/StudentData/v1_lithology.Rds and b/StudentData/v1_lithology.Rds differ diff --git a/StudentData/v1_sample_meta.Rds b/StudentData/v1_sample_meta.Rds index 219183b..de902e4 100644 Binary files a/StudentData/v1_sample_meta.Rds and b/StudentData/v1_sample_meta.Rds differ diff --git a/StudentData/v1_sherloc.Rds b/StudentData/v1_sherloc.Rds index 4e3f395..5d3382d 100644 Binary files a/StudentData/v1_sherloc.Rds and b/StudentData/v1_sherloc.Rds differ diff --git a/StudentNotebooks/Assignment03/walczd3_assignment03.Rmd b/StudentNotebooks/Assignment03/walczd3_assignment03.Rmd new file mode 100644 index 0000000..e15cfcd --- /dev/null +++ b/StudentNotebooks/Assignment03/walczd3_assignment03.Rmd @@ -0,0 +1,628 @@ +--- +title: "DAR F24 Assignment 3 Notebook Template" +author: "David Walczyk" +date: "`r Sys.Date()`" +output: + html_document: + toc: yes + pdf_document: + toc: yes +subtitle: "DAR Project Name: Mars" +--- + + +## BiWeekly Work Summary + +**NOTE:** Follow an outline format; use bullets to express individual points. + +* RCS ID: **Always** include this! +* Project Name: **Always** include this! +* Summary of work since last week + + * Describe the important aspects of what you worked on and accomplished + + +* Summary of github commits + + * include branch name(s) + * include browsable links to all external files on github + * Include links to shared Shiny apps + +* List of presentations, papers, or other outputs + + * Include browsable links + +* List of references (if necessary) +* Indicate any use of group shared code base +* Indicate which parts of your described work were done by you or as part of joint efforts + +* **Required:** Provide illustrating figures and/or tables + +## Personal Contribution + +* Clearly defined, unique contribution(s) done by you: code, ideas, writing... +* Include github issues you've addressed if any + +_PACKAGES_ + +```{r} + +# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!! +# This section install packages if they are not already installed. +# This block will not be shown in the knit file. +knitr::opts_chunk$set(echo = TRUE) + +# Set the default CRAN repository +local({r <- getOption("repos") + r["CRAN"] <- "http://cran.r-project.org" + options(repos=r) +}) + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} + +if (!require("ggplotify")) { + install.packages("ggplotify") + library(ggplotify) +} + +if (!require("car")) { + install.packages("car") + library(car) +} + +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} +if (!require("apcluster")) { + install.packages("apcluster") + library(apcluster) +} +if (!require("vegan")) { + install.packages("vegan") + library(vegan) +} +if (!require("ape")) { + install.packages("ape") + library(ape) +} +if (!require("Matrix")) { + install.packages("Matrix") + library(Matrix) +} + +if (!require("gridExtra")) { + install.packages("gridExtra") + library(gridExtra) +} + +if (!require("umap")) { + install.packages("umap") + library(umap) +} + +if (!require("ggtern")) { + install.packages("ggtern") + library(ggtern) +} +``` + +_LOAD IN DATA_ + +```{r, result01_data} + +#-------------LIBS------------------- +# Load the saved LIBS data with locations added +libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") +libs.std_dev <- libs.df %>% + select((c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) +libs.df <- libs.df %>% + select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) + +# Convert the points to numeric +libs.df$point <- as.numeric(libs.df$point) + +# Review what we have +summary(libs.df) + +#----------PIXL---------------------- +# Load the saved PIXL data with locations added +pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") + +pixl.df +# Convert to factors +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) + +# Review our dataframe +summary(pixl.df) + +#----------SHERLOC---------------------- +# Read in data as provided. +sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds") + +# Clean up data types +sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral) +sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], + as.numeric) +# Transform NA's to 0 +sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0) + +# Reformat data so that rows are "abrasions" and columns list the presence of minerals. +# Do this by "pivoting" to a long format, and then back to the desired wide format. + +sherloc_long <- sherloc_abrasion_raw %>% + pivot_longer(!Mineral, names_to = "Name", values_to = "Presence") + +# Make abrasion a factor +sherloc_long$Name <- as.factor(sherloc_long$Name) + +# Make it a matrix +sherloc.matrix <- sherloc_long %>% + pivot_wider(names_from = Mineral, values_from = Presence) + +# Get sample information from PIXL and add to measurements -- assumes order is the same + +sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix) + +# Review what we have +summary(sherloc.df) + + +# Load the saved lithology data with locations added +lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds") + +# Cast samples as numbers +lithology.df$sample <- as.numeric(lithology.df$sample) + +# Convert rest into factors +lithology.df[sapply(lithology.df, is.character)] <- + lapply(lithology.df[sapply(lithology.df, is.character)], + as.factor) + +# Keep only first 16 samples because the data for the rest of the samples is not available yet +lithology.df<-lithology.df[1:16,] + +# Create a matrix containing only the numeric measurements. The remaining features are metadata about the sample. +lithology.matrix <- sapply(lithology.df[,6:40],as.numeric)-1 + +# Review the structure of our matrix +str(lithology.matrix) +``` + + + + +## Analysis: Question 1 (Provide short name) + +### Question being asked + +_Are there any similarities or unusual trends in our LIBS data that we can compare to rover collected PIXL samples?_ + +### Data Preparation + +* For analysis #1 I will be analyzing the differences in LIBS and PIXL data exclusively. LIBS data contains only 8 numerical features, all of which are contained in the PIXL data. 4 features are unique to the PIXL samples; P2O5, Cl, Cr203, and MnO. Therefore, aside from metadata like campaign (which will be included later) or rock type, I am preparing the data to analyze shared features. + +* The first step will be applying a variety of clustering algorithms including Affinity Propagation (AP), k-means, & Uniform Manifold Approximation and Projection (UMAP). I apply these because I'm interested to see if under different supervised and unsupervised clustering methods, do similar features or clusters arise. After clustering, to see where shared oxide features are most correlated I'll plot a PCA with the most prominent clustering technique on LIBS data and compare the eigenvectors to that of the PIXL PCA. + +* LIBS & PIXL datasets exclusively for this question. + + +### Analysis: Methods and results + +_Applying clustering algorithms to detect distinct clusters._ + + +```{r, result01_analysis} +#https://books.google.com/books?hl=en&lr=&id=spQ7FWsRX30C&oi=fnd&pg=PA3&dq=sedimentary+rocks&ots=T0fThFnYqm&sig=GbZZW_JuHjm9VmebYKaP1IRzSD8#v=onepage&q=sedimentary%20rocks&f=false +#https://www.osti.gov/servlets/purl/1409785 - LIBS +# - LIBS data is data not directly sampled by the Rover and its abrasion tool. Instead it is found by a projected laser from the SUPERCAM instrument that points at a specific rock and is able to distinguish the specified Polyatomic ions by wavelength intensity. + + +#PIXL samples lat and lon +samples <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples.Rds") +sample_coord <- samples[which(samples$name %in% pixl.df$name),c(3:5)] + +shared_pixl <- pixl.df[,c(5, 11, 4, 14, 3, 10, 2, 9, 15, 17)] +names(shared_pixl)[1:8] <- names(libs.df[,6:13]) +shared_libs <- cbind(libs.df[,6:13], lat = libs.df[,2], lon = libs.df[,3], point = libs.df[,5]) #features, lat, long, point(?) + +#AP Clustering on shared data +set.seed(4) +get_ap <- function( data) { + ap <- apcluster(negDistMat(r = 2), data, q = 0.001) + clusters <- ap@clusters + ap_clusters <- c(1:16) + for (i in seq(length(clusters))) { + num <- i + for (j in seq(length(clusters[[i]]))) { + ap_clusters[clusters[[i]][[j]]] = num + } + } + return (ap_clusters) +} + +ap_clusters.pixl <- get_ap(shared_pixl[,1:8]) +ap_clusters.libs <- get_ap(shared_libs[,1:8]) + +unique(ap_clusters.pixl) #k = 3 +unique(ap_clusters.libs) #k = 13 + + +#Find k-means cluster +wssplot <- function(data, df, nc=8, seed=4){ + wss <- data.frame(cluster=1:nc, quality=c(0)) + for (i in 1:nc){ + set.seed(seed) + wss[i,2] <- kmeans(data, centers=i)$tot.withinss + } + ggplot(data=wss,aes(x=cluster,y=quality)) + + geom_line() + + ggtitle(paste("Quality of k-means by Cluster -", df)) +} + +wssplot(shared_pixl[,1:8], "PIXL") #k = 3,4 +wssplot(shared_libs[,1:8], "LIBS") #k = 3,4,6 + +umapplot <- function(data, i, df) { + custom.config <- umap.defaults + custom.config$n_neighbors = i + UMAP <- umap(data, config = custom.config) + plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = paste("UMAP[",df,"] on nn = ",i, sep = "")) +} + +#Apply UMAP +#find optimal kNN +#PIXL, with all features under UMAP showed unrecognizable features. Shared features showed 2,4,6 possible clusters but did not really converge as nn rose + +#LIBS +#nn <- seq(5,25, 5) +#for (i in nn) { +# umapplot(shared_libs[,1:8],i, "LIBS") +#} +#nn = 25 for just LIBS data found 4 distinct clusters. One of the clusters right off the bat showed some form of seperation at lower n_neighbors parameters but evidently started to come together as nn increased +custom.config <- umap.defaults +custom.config$n_neighbors = 25 +UMAP <- umap(shared_libs[,1:8], config = custom.config ) +UMAP.data <- UMAP$layout +plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = "UMAP on nn = 25") +abline( a =0, b = -3, h = 0,col = "blue") + +#use line: y = -3x to find which data points are to the left and right of the cluster line. +# y = -3x --> x = y / -3 +x <- UMAP.data[,2] / -3 +plot(x = x, y = UMAP.data[,2]) +umap_clusters <- rep(NA,nrow(UMAP.data)) +umap_clusters[which( (UMAP.data[,1] >= 2) & (UMAP.data[,2] > 0) )] = 1 +umap_clusters[which( (UMAP.data[,1] < 2) & (UMAP.data[,2] > 0) )] = 2 +umap_clusters[which( UMAP.data[,1] > 9)] = 3 #unique cluster isolated +umap_clusters[which( (UMAP.data[,1] < x) & (UMAP.data[,2] < 0)) ] = 4 +umap_clusters[ which( ( (UMAP.data[,1] > x ) & (UMAP.data[,1] < 9) ) & (UMAP.data[,2] < 0))] = 5 +ggplot(data.frame(UMAP.data), aes(x = X1, y = X2, color = as.factor(umap_clusters))) + + geom_point() + + labs(title = "UMAP cluster check: nn = 15") + +#run PCA with umap clustering on libs data + campaign type for pixl of shared features +pca.libs <- prcomp(shared_libs[,1:8], retx = T, center = T) +umap.biplot <- ggbiplot::ggbiplot(pca.libs, + groups = as.factor(umap_clusters), + circle = T, + varname.size = 2, + varname.color = "red", labels.size =8 + ) + theme_bw() + labs(title = "PCA with UMAP clusters") +#total PIXL +pca.pixl <- prcomp(pixl.df[,2:14], retx = T, center = T) +pixl.biplot <- ggbiplot::ggbiplot(pca.pixl, + groups = pixl.df$campaign, + circle = T, + varname.size = 2, + varname.color = "red", labels.size =8 + ) + theme_bw() + labs(title = "PCA with UMAP clusters") + + + + +#plot heatmaps +umap.centers <- data.frame() +unq <- unique(umap_clusters) +for (i in unq) { + arr <- colMeans(shared_libs[which(umap_clusters == i),1:8]) + umap.centers <- rbind(umap.centers, arr) +} +names(umap.centers) <- colnames(shared_libs[,1:8]) + +pixl.centers <- data.frame() +for (i in unique(shared_pixl$campaign)) { + arr <- colMeans(pixl.df[which(shared_pixl$campaign == i), 2:14]) + pixl.centers <- rbind(pixl.centers, arr) +} +names(pixl.centers) <- names(pixl.df)[2:14] + +pixl.centers +u.heatmap <- pheatmap(umap.centers, scale = "none", main = "UMAP heatmap") +pxl.heatmap <- pheatmap(pixl.centers, scale = "none", main = "PIXL-Campaign heatmap", labels_row = unique(shared_pixl$campaign)) + +grid.arrange(pixl.biplot, umap.biplot, ncol = 2) +grid.arrange(as.ggplot(u.heatmap), as.ggplot(pxl.heatmap), ncol = 2) +#look at the pattern differences between high SiO2 concentrations with FeO-T + MgO & Al2O3 + CaO + +``` + +### Discussion of results + +* Campaign type is a powerful indicator of the type of chemical makeup that a sample will have. LIBS data is kind of erratic and while we are getting information that can help classify how each cluster identified by UMAP is related to campaign type, it is hard to say whether or not this information is reliable. In the next analysis I will compare SHERLOC data to my found clusters, especially that of delta front campaigns and the unusually high CaO cluster found. + + +## Analysis: Question 2 (Provide short name) + +### Question being asked + +_According to papers in 1985 and 2010 regarding amelioration (glacial to inter glacial stages) of lake basins and analysis of river basins and its trapped sediments, respectively, an index called the chemical index of alteration (CIA) was used to measure the level of weathering that rocks underwent as a result of chemical reactions (i.e reactions with the water and other dissolved substances). Can we use this index to get accurate information about how silicate-rocks underwent some lasting form of chemical weathering that is indicative of the last stages of water on Mars and its effect on basin and delta front rocks?_ + +### Data Preparation + +* For analysis #2 I will be utilizing the lithography and PIXL dataset to calculate CAI for our samples considering we know the definite campaign and sampling location they were at. + +* The immediate code below is just another way of plotting the concentration (in this case density) of all our features on different clusters. This is arbitrary and was just used as another way of plotting. In my analysis I first plotted all lithology points to find which of the silicates specified in Dr. Roger's lecture 2 were present. I copied these column names so that I could find the relative abundance of silicates in each sample however, this was just a check for PIXL data and as of now has no direct meaning other than associating > 0 or < 0, meaning that silicates with CaO are available. Regardless, for PIXL and LIBS I found the distributions of CIA values for all samples and clusters, respectively. I then looked at the difference between genuine weathering criteria (CIA > 70 indicates some form of weathering occurred; there are specific ranges that differ in the literature so I just wanted to see as a whole whether samples > or < 70 differed). Finally, I plotted a ternary plot of all the values greater than 70 with the respective cation axis'. + +* I will be re-using shared_libs and shared_pixl. To find silicate containing rocks I will be using the lithology.df +```{r, result02_data} +#for each cluster facet_wrap the features but color the cluster in +shared_libs$cluster <- umap_clusters +shared_libs.long <- shared_libs %>% + pivot_longer(cols = names(shared_libs)[1:8], names_to = "Variable", values_to = "Values") +ggplot(shared_libs.long, aes(x = Values, color = as.factor(cluster))) + + geom_density() + + facet_wrap(~Variable, scales = "free") + + labs(title = "Density plots of all variables") +#lat long map +sample_coord$name <- "PIXL" +coords <- rbind(data.frame(name = rep("LIBS", nrow(libs.df)), lat = libs.df$lat, lon = libs.df$lon), sample_coord) +ggplot(data = coords, aes(x = lat, y = lon, color = name)) + + geom_point() + +``` + +### Analysis: Methods and Results + +* I want to see if their is any clear form of chemical weathering between UMAP identified clusters. + + +```{r, result02_analysis} +#Lithology heatmap - so note CaO references only the CaO that is only avaliable in silicate rocks. For this since, we aren't exactly sure of the chemical composition of SHERLOC or lithology data, we just check to see if the samples contain silicates (rowSums[which(silicates)] > 0) +#silicates are found in accordance +pheatmap(lithology.matrix, scale = "none", main = "Lithology heatmap", labels_row = lithology.df$campaign) + +delta.minerals <- c( "Kaolinite", "Hydrated_Mg_Fe_Sulfate", "Fe_Mg_clay", "Mg_sulfate", "Spinels", "Zircon/Baddeleyite", "Chromite", "Ilmenite", "Fe_Mg_carbonate") #Isolated delta minerals; notice the abundance of Mg_Fe minerals +silicates <- c("quartz", "feldspar", "pyroxene", "olivine", "Fe_Mg_clay") +rowSums(lithology.matrix[,silicates]) #all samples satisfied + +#PIXL +CIA.pixl <- (pixl.df$Al203 / (pixl.df$Al203 + pixl.df$Na20 + pixl.df$K20 + pixl.df$Cao)) * 100 +hist(CIA.pixl) + +#LIBS +#do these values differ for each cluster? +CIA.libs <- data.frame() +for (i in unq) { + arr <- shared_libs[which(umap_clusters == i), 1:8] + CIA <- cbind((arr$Al2O3 / (arr$Al2O3 + arr$Na2O + arr$K2O + arr$CaO)) * 100, cluster = i, index = as.numeric(rownames(arr))) + CIA.libs <- rbind(CIA.libs, CIA) +} +names(CIA.libs) <- c("CIA", "cluster","index") + +ggplot(data = CIA.libs, aes(x = CIA)) + + geom_histogram() + + facet_wrap(~as.factor(cluster), scales = "free") + +#cluster 1 - 5 +# (1) unimodal peak around 45-50 = virtually no weathering. Higher values > 60 should be looked at more in depth +# (2) same as (1) but another smaller peak around 55. again no weathering, but higher values should be kept +# (3) virtually no weathering. I'm wondering if these LIBS samples are benign or what. They have high Ca values and about nothing else +# (4) Unimodal at 50 +# (5) Left skewed-unimodal around 50, with some higher values + +clusters.gt70 <-CIA.libs[which(CIA.libs$CIA >= 70),] +clusters.gt70.centers <- data.frame() +for (i in unique(clusters.gt70$cluster)) { #no cluster 3 due to its low values + arr <- clusters.gt70[which(clusters.gt70$cluster == i), ] + means <- colMeans(shared_libs[arr$index,1:8]) + clusters.gt70.centers <- rbind(clusters.gt70.centers, means) +} + +names(clusters.gt70.centers) <- names(shared_libs)[1:8] +gt.70.heatmap <- pheatmap(clusters.gt70.centers, scale = "none", main = "GT60") + +clusters.lt70 <- CIA.libs[which(CIA.libs$CIA < 70),] +clusters.lt70.centers <- data.frame() +for (i in unique(clusters.lt70$cluster)) { #no cluster 3 due to its low values + arr <- clusters.lt70[which(clusters.lt70$cluster == i), ] + means <- colMeans(shared_libs[arr$index,1:8]) + clusters.lt70.centers <- rbind(clusters.lt70.centers, means) +} + +names(clusters.lt70.centers) <- names(shared_libs)[1:8] +lt.70.heatmap <- pheatmap(clusters.lt70.centers, scale = "none", main = "LT60") + +grid.arrange(as.ggplot(gt.70.heatmap), as.ggplot(lt.70.heatmap), ncol = 2) + +#really just alumine is in higher quantities. This is an indicator of more potent weathering. + +libs_ternary <- shared_libs %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="LIBS", CIA = NA) %>% + select(c(13:17)) + +libs_ternary[clusters.gt70$index, 5] <- "CIA>70" +libs_ternary[clusters.lt70$index, 5] <- "CIA<70" + + +pixl_ternary <- shared_pixl %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="PIXL", CIA = "CIA<70") %>% + select(c(11:15)) + + + +ggtern(libs_ternary, ggtern::aes(x = x, y = y, z = z, color = CIA, shape = value)) + + geom_point() + + geom_point(data = pixl_ternary, aes(x = x, y = y, z = z, color = "PIXL", shape = value)) + + theme_rgbw() + + labs(x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K", + title = 'LIBS and PIXL ternary clustered by CIA') + +#All PIXL data points are < 70 + +ggtern(libs_ternary, ggtern::aes(x = x, y= y, z = z, color = as.factor(umap_clusters))) + + geom_point() + + theme_rgbw() + + labs(x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K", + title = "UMAP Clustered on Major Cations") + +``` + +### Discussion of results + +* There isn't a huge difference in weathering conditions on cation densities aside from differing Al2O3, but this is the main polyatomic ion in question when calculating CIA so it makes sense that it would be the lead indicator of weathering conditions purely based on the formula. However, while Ca, Na & K also were calculated it points towards the Fe + Mg densities on how different weathering might be affected by Fe + Mg concentrations. In all our heatmaps we saw that the relationship between Fe and Mg as either in unison or slightly different. One quick question would be, well, if our Alumine values differ what is the relationship between Fe and Mg (in terms of %), and does this matter? + +* Also, quick thought, but the points that are in the middle of the ternary plot who have relatively high concentrations of all cation combinations; what does that say about their mineralogy? + + +## Analysis: Question 3 (Provide short name) + +### Question being asked + +_Does the abundance of certain minerals provide any insight on ternary plots of major cations? _ + +### Data Preparation + +* In order to check the differences in mineral abundance, I will be utilizing the lithology dataframe to see if there are any clear cut differences in mineral abundance between data points, especially differing that of campaign type and rock type. + +* The steps for this analysis are very simple. Im going to find the sizes (i.e total sum of each sample row of minerals) of each sample to then compare on ternary graphs of major cations, under studied cations and then see if I can find some correlation between total wt% of samples to their abundance of minerals. Finding out that there isn't (as seen at the bottom) shows that something else is going on. + +_lithology.df, shared_libs, shared_pixl_ + +```{r, result03_data} +# Include all data processing code (if necessary), clearly commented + +``` + +### Analysis methods used + +* For this last analysis I just wanted to look at unusual elements compared to their relative abundance of total minerals. There is no direct reason for the way I split cation nodes, but I am aware that SO3 and Cl are assocaited with lake ecosystems and that Ti and Cr203 are very unreactive based on their wt%. + +* We don't see any outstanding results from this analysis but we can see that abundance is related to total wt%. Our r^2 with regular pearson coefficient is ~27% while Kendall's non-parametric test is ~20%. This is an extreme oversimplification and I believe that comparing total wt% to certain mineral types might actually prove to be substantial and might aid in prediction tasks later on. With regard to our ternary plots, while there aren't many outstanding results, we see that Roubion had the largest abundance of minerals (13 in total) but that sedimentary values (who gravitated towards SO3 & Cl concentrations) actually had the lowest abundance of total minerals. If we were to create some form of specialized metric that accounted for certain groups of minerals and then measured that to total wt% and/or cation groups this might prove interesting. + +```{r, result03_analysis} +#size of points for Lithology is dependent on the abundance of each mineral within the 16 samples ON a ternary plot. Hover over, can we see which + +#characterize size of points by their total abundance per sample at first. +sizes <- rowSums(lithology.matrix) + +ggtern(pixl_ternary, ggtern::aes(x = x, y= y, z = z)) + + geom_point(aes(size = sizes)) + + theme_rgbw() + + labs(x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K", + title = "Ternary Plot with major Cations, sized by mineral abundance") +#no real defined pattern but from our last plot and the point that tends inward is just another reason behind the idea that higher wt% likely have larger variety of minerals. +#im going to alter the Ca+Na+K just to see if there is an effect on other features + + + +#look at unusual elements +revised_pixl_ternary <- pixl.df %>% + mutate(x=(S03+Cl)/100,y=(P205+Mno)/100,z=(Ti02+Cr203 )/100) %>% + select(c(20:22)) %>% + mutate(sizes = sizes) + +ggtern(revised_pixl_ternary, ggtern::aes(x = x, y= y, z = z)) + + geom_point(aes(size = sizes, color = pixl.df$type)) + + theme_rgbw() + + labs(x="S03+Cl", + y="P205+MnO", + z="Ti02+Cr203", + title = "Ternary Plot of usually undocumented Cations") + + +hist(rowSums(pixl.df[,2:14])) +hist(sizes) +plot(x = rowSums(pixl.df[,2:14]), y = sizes) +cor.test(x = rowSums(pixl.df[,2:14]), y = sizes, method = 'kendall') #non-parametric cor.test + + +``` + + +### Discussion of results + +* Overall, our results show the abundance of distinct clusters especially that of high CaO concentration, MgO + FeO-T patterns , SiO2 patterns and overall slight differences in other metrics that might affect classification of LIBS data in particular with correspondence to sampled PIXL values. Cation abundance in clusters is a powerful classifier and can be an indicator of rock morphology. I think the clustering of the LIBS data points and plotting PIXL collected samples on top is an important first step of finding how we can connect more data to classify Mars surface. LIBS is the most important and most detail heavy as it gives us insight into abnormal values as well as where on the surface does minerality change. From now forward, clustering of LIBS seems arbitrary and redundant and instead I think focus onto how Lithology connects to PIXL and how ultimately the oxide concentrations of LIBS can explain what minerals might be present. In addition comparing elemental chemistry with the chemical formulas of minerals will also prove useful as we can see not only the differences in all types of mineral formulas and how they differ. This might provide insight into how the rocks studied ended up how they are as well as a proxy for potential life? This is not as realistic but as the rover moves to different spots we might see clear cut results that are indicative of life. Finally, and as to not be redundant, but finding direct correlations between certain compounds and the abundance, and thus variance, of minerals within each sample might give us insight into how we can predict what exactly LIBS is analyzing. + + + +## Summary and next steps + +* The main next step I think is to find these direct correlations and really dive into the literature on how rock types differ among minerality and how elemental composition can differ. In addition does this vary per campaign type, depth (z) within the crater and the landscape (geomprhology) that surround the rover's course? Personally, I think if there is any life it is underground. The surface of Mars is far too cold to support life now, and its atmosphere is far too counter intuitive to life. Taking a deep look at the poles might be interesting as well as there is pre-existing knowledge (https://marsed.asu.edu/mep/ice/polar-caps) that they might contain water vapor in the atmosphere solidied with carbon dioxide. That being said given the rovers RIMFAX frequency sensor that captures underground stratifications I wonder if life could survive under the ground (Wierzchos et al. 2012)? Radiation is detrimental to organisms under UVC and UVB wavelengths since the ozone of Mars is not developed; thus, life might be able to thrive in endolithic colonies wihtin the fissures of the underground. + +* Finally, creation of a metric that can help us understand SHERLOC data a little better would be very beneficial. A prototype metric will compare the minerals chemical formula to the concentration of minerals (or just oxides) from PIXL and LIBS datasets. It will find the molecular weight of each element or compound times (*) however many atoms/ions are present than compare that mass to the total mass of the chemical formula. ([mass / total mass] x 100). This value will be compared to the weight % of the PIXL feature. Summing all these up for every compound/element and finding the differnce from 1 will be a good indicator of how much of that mineral is explained by our data. That being said it will be difficult to account for oxides and individual elements, as well as for 'other' compounds. + +# 1 - Riemann sum ( n = 1, last element) |mineral element/compound% - PIXL element/compound%| #these are not division signs just slashes + + + + + + + + + +## Notes + +- see which lat and long correspond to similar values of our samples. Maybe we will see more. Apply clustering to data. Using AP clustering find clusters and see the overlap of prior PCA plots (Specifically campaign) and heatmaps to analyze concentration of clusters + +- SHERLOC has been used recently at Cheyava falls. What signs in both our samples, and in our LIBS data show that chemical reactions possible for life have occured? +#https://www.nasa.gov/missions/mars-2020-perseverance/perseverance-rover/nasas-perseverance-rover-scientists-find-intriguing-mars-rock/ + +- I think retrieving the chemical formulas for pure types of minerals and comparing the weight % of all different elemental features and seeing to what amount are these minerals different than their pure countertype. +- Iron and P were found on these spotchy rocks at Cheyava Falls. Can we track the amount of iron and P, or chemically stimulating mutual elements that will also provide evidence of life? + +- what differentiates these between abiotic and biotic (coming from animals, decomposed matter?) + +- what is the importance of Sulfur trioxide on river beds? + +- silicon dioxide? + +- why are Ferric(is) oxides and Magnesium oxide compatible? What about their relationship exposes their variance in the same direction? + +- clusters both exhibit small proportions of Na2O, TiO2, K2O, Al2O3 & CaO +- SiO2, ferric and magnesium oxide are of interest. SiO2 always seems to have a specific cluster with a very high amount of SiO2 + +- can we get access to RIMFAX data? (Wierzchos et al. 2012) +- fissures might show promise of life. endolithic colonies usually go underground to escape solar radiation (however, lots of UVB and UVC) diff --git a/StudentNotebooks/Assignment03/walczd3_assignment03.html b/StudentNotebooks/Assignment03/walczd3_assignment03.html new file mode 100644 index 0000000..96fb678 --- /dev/null +++ b/StudentNotebooks/Assignment03/walczd3_assignment03.html @@ -0,0 +1,1464 @@ + + + + +
+ + + + + + + + + + +NOTE: Follow an outline format; use bullets to +express individual points.
+RCS ID: Always include this!
Project Name: Always include this!
Summary of work since last week
+Summary of github commits
+List of presentations, papers, or other outputs
+List of references (if necessary)
Indicate any use of group shared code base
Indicate which parts of your described work were done by you or +as part of joint efforts
Required: Provide illustrating figures and/or +tables
PACKAGES
+# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!!
+# This section install packages if they are not already installed.
+# This block will not be shown in the knit file.
+knitr::opts_chunk$set(echo = TRUE)
+
+# Set the default CRAN repository
+local({r <- getOption("repos")
+ r["CRAN"] <- "http://cran.r-project.org"
+ options(repos=r)
+})
+
+if (!require("pandoc")) {
+ install.packages("pandoc")
+ library(pandoc)
+}
+## Loading required package: pandoc
+if (!require("ggplotify")) {
+ install.packages("ggplotify")
+ library(ggplotify)
+}
+## Loading required package: ggplotify
+if (!require("car")) {
+ install.packages("car")
+ library(car)
+}
+## Loading required package: car
+## Loading required package: carData
+# Required packages for M20 LIBS analysis
+if (!require("rmarkdown")) {
+ install.packages("rmarkdown")
+ library(rmarkdown)
+}
+## Loading required package: rmarkdown
+##
+## Attaching package: 'rmarkdown'
+## The following objects are masked from 'package:pandoc':
+##
+## pandoc_available, pandoc_convert, pandoc_version
+if (!require("tidyverse")) {
+ install.packages("tidyverse")
+ library(tidyverse)
+}
+## Loading required package: tidyverse
+## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
+## ✔ dplyr 1.1.4 ✔ readr 2.1.5
+## ✔ forcats 1.0.0 ✔ stringr 1.5.1
+## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
+## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
+## ✔ purrr 1.0.2
+## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
+## ✖ dplyr::filter() masks stats::filter()
+## ✖ dplyr::lag() masks stats::lag()
+## ✖ dplyr::recode() masks car::recode()
+## ✖ purrr::some() masks car::some()
+## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
+if (!require("stringr")) {
+ install.packages("stringr")
+ library(stringr)
+}
+
+if (!require("ggbiplot")) {
+ install.packages("ggbiplot")
+ library(ggbiplot)
+}
+## Loading required package: ggbiplot
+if (!require("pheatmap")) {
+ install.packages("pheatmap")
+ library(pheatmap)
+}
+## Loading required package: pheatmap
+if (!require("apcluster")) {
+ install.packages("apcluster")
+ library(apcluster)
+}
+## Loading required package: apcluster
+##
+## Attaching package: 'apcluster'
+##
+## The following object is masked from 'package:stats':
+##
+## heatmap
+if (!require("vegan")) {
+ install.packages("vegan")
+ library(vegan)
+}
+## Loading required package: vegan
+## Loading required package: permute
+## Loading required package: lattice
+## This is vegan 2.6-8
+if (!require("ape")) {
+ install.packages("ape")
+ library(ape)
+}
+## Loading required package: ape
+##
+## Attaching package: 'ape'
+##
+## The following object is masked from 'package:dplyr':
+##
+## where
+if (!require("Matrix")) {
+ install.packages("Matrix")
+ library(Matrix)
+}
+## Loading required package: Matrix
+##
+## Attaching package: 'Matrix'
+##
+## The following objects are masked from 'package:tidyr':
+##
+## expand, pack, unpack
+if (!require("gridExtra")) {
+ install.packages("gridExtra")
+ library(gridExtra)
+}
+## Loading required package: gridExtra
+##
+## Attaching package: 'gridExtra'
+##
+## The following object is masked from 'package:dplyr':
+##
+## combine
+if (!require("umap")) {
+ install.packages("umap")
+ library(umap)
+}
+## Loading required package: umap
+if (!require("ggtern")) {
+ install.packages("ggtern")
+ library(ggtern)
+}
+## Loading required package: ggtern
+## Registered S3 methods overwritten by 'ggtern':
+## method from
+## grid.draw.ggplot ggplot2
+## plot.ggplot ggplot2
+## print.ggplot ggplot2
+## --
+## Remember to cite, run citation(package = 'ggtern') for further info.
+## --
+##
+## Attaching package: 'ggtern'
+##
+## The following objects are masked from 'package:gridExtra':
+##
+## arrangeGrob, grid.arrange
+##
+## The following objects are masked from 'package:ggplot2':
+##
+## aes, annotate, ggplot, ggplot_build, ggplot_gtable, ggplotGrob,
+## ggsave, layer_data, theme_bw, theme_classic, theme_dark,
+## theme_gray, theme_light, theme_linedraw, theme_minimal, theme_void
+LOAD IN DATA
+#-------------LIBS-------------------
+# Load the saved LIBS data with locations added
+libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
+libs.std_dev <- libs.df %>%
+ select((c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+ MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+libs.df <- libs.df %>%
+ select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+ MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+
+# Convert the points to numeric
+libs.df$point <- as.numeric(libs.df$point)
+
+# Review what we have
+summary(libs.df)
+## sol lat lon target
+## Min. : 15.0 Min. :18.43 Min. :77.34 Length:1932
+## 1st Qu.: 281.0 1st Qu.:18.44 1st Qu.:77.36 Class :character
+## Median : 557.0 Median :18.46 Median :77.40 Mode :character
+## Mean : 565.1 Mean :18.46 Mean :77.40
+## 3rd Qu.: 872.0 3rd Qu.:18.48 3rd Qu.:77.44
+## Max. :1019.0 Max. :18.50 Max. :77.45
+## point SiO2 TiO2 Al2O3
+## Min. : 1.000 Min. : 0.00 Min. :0.0000 Min. : 0.000
+## 1st Qu.: 3.000 1st Qu.:42.04 1st Qu.:0.0300 1st Qu.: 3.080
+## Median : 5.000 Median :45.80 Median :0.3200 Median : 4.925
+## Mean : 5.776 Mean :43.47 Mean :0.3719 Mean : 6.246
+## 3rd Qu.: 8.000 3rd Qu.:49.23 3rd Qu.:0.6400 3rd Qu.: 8.533
+## Max. :28.000 Max. :76.12 Max. :2.4000 Max. :38.350
+## FeOT MgO CaO Na2O
+## Min. : 0.29 Min. : 0.29 Min. : 0.080 Min. :0.0000
+## 1st Qu.:13.27 1st Qu.: 5.72 1st Qu.: 1.830 1st Qu.:0.9775
+## Median :20.21 Median :12.78 Median : 3.625 Median :1.5200
+## Mean :20.07 Mean :16.47 Mean : 4.726 Mean :1.7600
+## 3rd Qu.:25.45 3rd Qu.:27.83 3rd Qu.: 4.622 3rd Qu.:2.4000
+## Max. :82.68 Max. :45.21 Max. :52.130 Max. :7.5200
+## K2O
+## Min. : 0.0000
+## 1st Qu.: 0.0000
+## Median : 0.3000
+## Mean : 0.5909
+## 3rd Qu.: 0.7800
+## Max. :34.8700
+#----------PIXL----------------------
+# Load the saved PIXL data with locations added
+pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds")
+
+pixl.df
+## # A tibble: 16 × 19
+## sample Na20 Mgo Al203 Si02 P205 S03 Cl K20 Cao Ti02 Cr203
+## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+## 1 1 5.55 2.64 7.56 38.3 1.65 2.69 3.4 0.75 7.77 1.47 0.03
+## 2 2 4.67 2.21 6.97 43.8 2.76 3.21 1.48 1.06 7.62 2.49 0.01
+## 3 3 1.93 19.2 2.42 39.4 0.48 0.78 0.66 0.18 2.94 0.37 0.26
+## 4 4 1.87 12.8 2.36 40.3 0.28 1.66 0.94 0.2 2.94 0.99 0.29
+## 5 5 4.5 0.73 11.6 57.1 0.84 1 2.08 1.9 4.31 0.59 0
+## 6 6 1.87 12.8 2.36 40.3 0.28 1.66 0.94 0.2 2.94 0.99 0.29
+## 7 7 1.87 12.8 2.36 40.3 0.28 1.66 0.94 0.2 2.94 0.99 0.29
+## 8 8 4.5 0.73 11.6 57.1 0.84 1 2.08 1.9 4.31 0.59 0
+## 9 9 4.5 0.73 11.6 57.1 0.84 1 2.08 1.9 4.31 0.59 0
+## 10 10 1.8 22.7 1.7 22.6 0.1 2.6 4.5 0.3 1.8 0.2 0.2
+## 11 11 1.8 22.7 1.7 22.6 0.1 2.6 4.5 0.3 1.8 0.2 0.2
+## 12 12 1.9 13.1 5 32.5 0.6 20 0.4 0.1 1.5 0.8 0.1
+## 13 13 1.9 13.1 5 32.5 0.6 20 0.4 0.1 1.5 0.8 0.1
+## 14 14 1 19.1 1.8 30.8 0.1 3.8 2 0 3.3 0.7 1.9
+## 15 15 1 19.1 1.8 30.8 0.1 3.8 2 0 3.3 0.7 1.9
+## 16 16 2.1 12.4 5.32 31.4 0.57 21.5 1.14 0.19 5.72 0.64 0.11
+## # ℹ 7 more variables: Mno <dbl>, `FeO-T` <dbl>, name <chr>, type <chr>,
+## # campaign <chr>, location <chr>, abrasion <chr>
+# Convert to factors
+pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)],
+ as.factor)
+
+# Review our dataframe
+summary(pixl.df)
+## sample Na20 Mgo Al203
+## Min. : 1.00 Min. :1.000 Min. : 0.730 Min. : 1.700
+## 1st Qu.: 4.75 1st Qu.:1.853 1st Qu.: 2.533 1st Qu.: 2.220
+## Median : 8.50 Median :1.900 Median :12.800 Median : 3.710
+## Mean : 8.50 Mean :2.672 Mean :11.682 Mean : 5.072
+## 3rd Qu.:12.25 3rd Qu.:4.500 3rd Qu.:19.100 3rd Qu.: 7.117
+## Max. :16.00 Max. :5.550 Max. :22.700 Max. :11.600
+##
+## Si02 P205 S03 Cl
+## Min. :22.60 Min. :0.1000 Min. : 0.780 Min. :0.400
+## 1st Qu.:31.22 1st Qu.:0.2350 1st Qu.: 1.495 1st Qu.:0.940
+## Median :38.85 Median :0.5250 Median : 2.600 Median :1.740
+## Mean :38.55 Mean :0.6512 Mean : 5.562 Mean :1.846
+## 3rd Qu.:41.17 3rd Qu.:0.8400 3rd Qu.: 3.800 3rd Qu.:2.080
+## Max. :57.10 Max. :2.7600 Max. :21.530 Max. :4.500
+##
+## K20 Cao Ti02 Cr203
+## Min. :0.0000 Min. :1.500 Min. :0.2000 Min. :0.000
+## 1st Qu.:0.1600 1st Qu.:2.655 1st Qu.:0.5900 1st Qu.:0.025
+## Median :0.2000 Median :3.120 Median :0.7000 Median :0.155
+## Mean :0.5800 Mean :3.688 Mean :0.8194 Mean :0.355
+## 3rd Qu.:0.8275 3rd Qu.:4.310 3rd Qu.:0.9900 3rd Qu.:0.290
+## Max. :1.9000 Max. :7.770 Max. :2.4900 Max. :1.900
+##
+## Mno FeO-T name type
+## Min. :0.1000 Min. :13.24 Atsah : 1 Igneous :8
+## 1st Qu.:0.2800 1st Qu.:16.71 Bearwallow: 1 N/A :1
+## Median :0.4000 Median :23.86 Coulettes : 1 Sedimentary:7
+## Mean :0.3812 Mean :21.45 Hahonih : 1
+## 3rd Qu.:0.4900 3rd Qu.:25.70 Hazeltop : 1
+## Max. :0.6900 Max. :30.05 Kukaklek : 1
+## (Other) :10
+## campaign location abrasion
+## Crater Floor:9 01 : 1 Alfalfa :2
+## Delta Front :7 02 : 1 Bellegrade :2
+## 03 : 1 Berry Hollow:2
+## 04 : 1 Dourbes :2
+## 05 : 1 Novarupta :2
+## 06 : 1 Quartier :2
+## (Other):10 (Other) :4
+#----------SHERLOC----------------------
+# Read in data as provided.
+sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds")
+
+# Clean up data types
+sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral)
+sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)],
+ as.numeric)
+# Transform NA's to 0
+sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0)
+
+# Reformat data so that rows are "abrasions" and columns list the presence of minerals.
+# Do this by "pivoting" to a long format, and then back to the desired wide format.
+
+sherloc_long <- sherloc_abrasion_raw %>%
+ pivot_longer(!Mineral, names_to = "Name", values_to = "Presence")
+
+# Make abrasion a factor
+sherloc_long$Name <- as.factor(sherloc_long$Name)
+
+# Make it a matrix
+sherloc.matrix <- sherloc_long %>%
+ pivot_wider(names_from = Mineral, values_from = Presence)
+
+# Get sample information from PIXL and add to measurements -- assumes order is the same
+
+sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix)
+
+# Review what we have
+summary(sherloc.df)
+## sample type campaign abrasion
+## Min. : 1.00 Igneous :8 Crater Floor:9 Alfalfa :2
+## 1st Qu.: 4.75 N/A :1 Delta Front :7 Bellegrade :2
+## Median : 8.50 Sedimentary:7 Berry Hollow:2
+## Mean : 8.50 Dourbes :2
+## 3rd Qu.:12.25 Novarupta :2
+## Max. :16.00 Quartier :2
+## (Other) :4
+## Name Plagioclase Sulfate Ca-sulfate
+## Atsah : 1 Min. :0.0000 Min. :0.0000 Min. :0.0000
+## Bearwallow: 1 1st Qu.:0.0000 1st Qu.:0.1875 1st Qu.:0.0000
+## Coulettes : 1 Median :0.0000 Median :1.0000 Median :0.0000
+## Hahonih : 1 Mean :0.1875 Mean :0.6562 Mean :0.3438
+## Hazeltop : 1 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
+## Kukaklek : 1 Max. :1.0000 Max. :1.0000 Max. :1.0000
+## (Other) :10
+## Hydrated Ca-sulfate Mg-sulfate Hydrated Sulfates Hydrated Mg-Fe sulfate
+## Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.0000
+## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000
+## Median :0.000 Median :0.0000 Median :0.000 Median :0.0000
+## Mean :0.125 Mean :0.1875 Mean :0.125 Mean :0.1875
+## 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.000 3rd Qu.:0.0000
+## Max. :1.000 Max. :1.0000 Max. :1.000 Max. :1.0000
+##
+## Perchlorates Na-perchlorate Amorphous Silicate Phosphate
+## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000
+## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
+## Median :0.0000 Median :0.00000 Median :0.0000 Median :0.0000
+## Mean :0.0625 Mean :0.03125 Mean :0.1406 Mean :0.2031
+## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.2500 3rd Qu.:0.3125
+## Max. :1.0000 Max. :0.50000 Max. :0.5000 Max. :1.0000
+##
+## Pyroxene Olivine Carbonate Fe-Mg carbonate
+## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
+## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.4375 1st Qu.:0.000
+## Median :1.0000 Median :0.6250 Median :1.0000 Median :0.000
+## Mean :0.6875 Mean :0.5312 Mean :0.7344 Mean :0.125
+## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.000
+## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
+##
+## Hydrated Carbonates Disordered Silicates Feldspar Quartz
+## Min. :0 Min. :0.000 Min. :0.000 Min. :0.00000
+## 1st Qu.:0 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.00000
+## Median :0 Median :0.000 Median :0.000 Median :0.00000
+## Mean :0 Mean :0.125 Mean :0.125 Mean :0.03125
+## 3rd Qu.:0 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.00000
+## Max. :0 Max. :1.000 Max. :1.000 Max. :0.25000
+##
+## Apatite FeTi oxides Halite Iron oxide
+## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
+## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
+## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
+## Mean :0.1406 Mean :0.1406 Mean :0.04688 Mean :0.2812
+## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.5000
+## Max. :1.0000 Max. :1.0000 Max. :0.25000 Max. :1.0000
+##
+## Hydrated Iron oxide Organic matter Sulfate+Organic matter
+## Min. :0.00000 Min. :0.0000 Min. :0.0000
+## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
+## Median :0.00000 Median :1.0000 Median :0.0000
+## Mean :0.01562 Mean :0.5938 Mean :0.2188
+## 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.2500
+## Max. :0.25000 Max. :1.0000 Max. :1.0000
+##
+## Other hydrated phases Phyllosilicates Chlorite
+## Min. :0.0000 Min. :0.00000 Min. :0.0000
+## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
+## Median :0.2500 Median :0.00000 Median :0.0000
+## Mean :0.4375 Mean :0.09375 Mean :0.0625
+## 3rd Qu.:1.0000 3rd Qu.:0.06250 3rd Qu.:0.0000
+## Max. :1.0000 Max. :0.50000 Max. :0.5000
+##
+## Kaolinite (hydrous Al-clay) Chromite Ilmenite Zircon/Baddeleyite
+## Min. :0.0000 Min. :0.000 Min. :0.000 Min. :0.000
+## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
+## Median :0.0000 Median :0.000 Median :0.000 Median :0.000
+## Mean :0.1875 Mean :0.125 Mean :0.125 Mean :0.125
+## 3rd Qu.:0.0000 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.000
+## Max. :1.0000 Max. :1.000 Max. :1.000 Max. :1.000
+##
+## Fe-Mg-clay minerals Spinels
+## Min. :0.0000 Min. :0.0000
+## 1st Qu.:0.0000 1st Qu.:0.0000
+## Median :0.0000 Median :0.0000
+## Mean :0.1875 Mean :0.0625
+## 3rd Qu.:0.0000 3rd Qu.:0.0000
+## Max. :1.0000 Max. :0.5000
+##
+# Load the saved lithology data with locations added
+lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds")
+
+# Cast samples as numbers
+lithology.df$sample <- as.numeric(lithology.df$sample)
+
+# Convert rest into factors
+lithology.df[sapply(lithology.df, is.character)] <-
+ lapply(lithology.df[sapply(lithology.df, is.character)],
+ as.factor)
+
+# Keep only first 16 samples because the data for the rest of the samples is not available yet
+lithology.df<-lithology.df[1:16,]
+
+# Create a matrix containing only the numeric measurements. The remaining features are metadata about the sample.
+lithology.matrix <- sapply(lithology.df[,6:40],as.numeric)-1
+
+# Review the structure of our matrix
+str(lithology.matrix)
+## num [1:16, 1:35] 0 0 0 0 0 0 0 1 1 0 ...
+## - attr(*, "dimnames")=List of 2
+## ..$ : NULL
+## ..$ : chr [1:35] "feldspar" "plagioclase" "pyroxene" "olivine" ...
+Are there any similarities or unusual trends in our LIBS data +that we can compare to rover collected PIXL samples?
+For analysis #1 I will be analyzing the differences in LIBS and +PIXL data exclusively. LIBS data contains only 8 numerical features, all +of which are contained in the PIXL data. 4 features are unique to the +PIXL samples; P2O5, Cl, Cr203, and MnO. Therefore, aside from metadata +like campaign (which will be included later) or rock type, I am +preparing the data to analyze shared features.
The first step will be applying a variety of clustering +algorithms including Affinity Propagation (AP), k-means, & Uniform +Manifold Approximation and Projection (UMAP). I apply these because I’m +interested to see if under different supervised and unsupervised +clustering methods, do similar features or clusters arise. After +clustering, to see where shared oxide features are most correlated I’ll +plot a PCA with the most prominent clustering technique on LIBS data and +compare the eigenvectors to that of the PIXL PCA.
LIBS & PIXL datasets exclusively for this question.
Applying clustering algorithms to detect distinct +clusters.
+#https://books.google.com/books?hl=en&lr=&id=spQ7FWsRX30C&oi=fnd&pg=PA3&dq=sedimentary+rocks&ots=T0fThFnYqm&sig=GbZZW_JuHjm9VmebYKaP1IRzSD8#v=onepage&q=sedimentary%20rocks&f=false
+#https://www.osti.gov/servlets/purl/1409785 - LIBS
+# - LIBS data is data not directly sampled by the Rover and its abrasion tool. Instead it is found by a projected laser from the SUPERCAM instrument that points at a specific rock and is able to distinguish the specified Polyatomic ions by wavelength intensity.
+
+
+#PIXL samples lat and lon
+samples <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples.Rds")
+sample_coord <- samples[which(samples$name %in% pixl.df$name),c(3:5)]
+
+shared_pixl <- pixl.df[,c(5, 11, 4, 14, 3, 10, 2, 9, 15, 17)]
+names(shared_pixl)[1:8] <- names(libs.df[,6:13])
+shared_libs <- cbind(libs.df[,6:13], lat = libs.df[,2], lon = libs.df[,3], point = libs.df[,5]) #features, lat, long, point(?)
+
+#AP Clustering on shared data
+set.seed(4)
+get_ap <- function( data) {
+ ap <- apcluster(negDistMat(r = 2), data, q = 0.001)
+ clusters <- ap@clusters
+ ap_clusters <- c(1:16)
+ for (i in seq(length(clusters))) {
+ num <- i
+ for (j in seq(length(clusters[[i]]))) {
+ ap_clusters[clusters[[i]][[j]]] = num
+ }
+ }
+ return (ap_clusters)
+}
+
+ap_clusters.pixl <- get_ap(shared_pixl[,1:8])
+ap_clusters.libs <- get_ap(shared_libs[,1:8])
+
+unique(ap_clusters.pixl) #k = 3
+## [1] 3 1 2
+unique(ap_clusters.libs) #k = 13
+## [1] 7 1 2 3 12 13 4 9 8 10 5 6 11
+#Find k-means cluster
+wssplot <- function(data, df, nc=8, seed=4){
+ wss <- data.frame(cluster=1:nc, quality=c(0))
+ for (i in 1:nc){
+ set.seed(seed)
+ wss[i,2] <- kmeans(data, centers=i)$tot.withinss
+ }
+ ggplot(data=wss,aes(x=cluster,y=quality)) +
+ geom_line() +
+ ggtitle(paste("Quality of k-means by Cluster -", df))
+}
+
+wssplot(shared_pixl[,1:8], "PIXL") #k = 3,4
+
+wssplot(shared_libs[,1:8], "LIBS") #k = 3,4,6
+
+umapplot <- function(data, i, df) {
+ custom.config <- umap.defaults
+ custom.config$n_neighbors = i
+ UMAP <- umap(data, config = custom.config)
+ plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = paste("UMAP[",df,"] on nn = ",i, sep = ""))
+}
+
+#Apply UMAP
+#find optimal kNN
+#PIXL, with all features under UMAP showed unrecognizable features. Shared features showed 2,4,6 possible clusters but did not really converge as nn rose
+
+#LIBS
+#nn <- seq(5,25, 5)
+#for (i in nn) {
+# umapplot(shared_libs[,1:8],i, "LIBS")
+#}
+#nn = 25 for just LIBS data found 4 distinct clusters. One of the clusters right off the bat showed some form of seperation at lower n_neighbors parameters but evidently started to come together as nn increased
+custom.config <- umap.defaults
+custom.config$n_neighbors = 25
+UMAP <- umap(shared_libs[,1:8], config = custom.config )
+UMAP.data <- UMAP$layout
+plot(x = UMAP$layout[,1], y = UMAP$layout[,2], main = "UMAP on nn = 25")
+abline( a =0, b = -3, h = 0,col = "blue")
+
+#use line: y = -3x to find which data points are to the left and right of the cluster line.
+# y = -3x --> x = y / -3
+x <- UMAP.data[,2] / -3
+plot(x = x, y = UMAP.data[,2])
+
+umap_clusters <- rep(NA,nrow(UMAP.data))
+umap_clusters[which( (UMAP.data[,1] >= 2) & (UMAP.data[,2] > 0) )] = 1
+umap_clusters[which( (UMAP.data[,1] < 2) & (UMAP.data[,2] > 0) )] = 2
+umap_clusters[which( UMAP.data[,1] > 9)] = 3 #unique cluster isolated
+umap_clusters[which( (UMAP.data[,1] < x) & (UMAP.data[,2] < 0)) ] = 4
+umap_clusters[ which( ( (UMAP.data[,1] > x ) & (UMAP.data[,1] < 9) ) & (UMAP.data[,2] < 0))] = 5
+ggplot(data.frame(UMAP.data), aes(x = X1, y = X2, color = as.factor(umap_clusters))) +
+ geom_point() +
+ labs(title = "UMAP cluster check: nn = 15")
+
+#run PCA with umap clustering on libs data + campaign type for pixl of shared features
+pca.libs <- prcomp(shared_libs[,1:8], retx = T, center = T)
+umap.biplot <- ggbiplot::ggbiplot(pca.libs,
+ groups = as.factor(umap_clusters),
+ circle = T,
+ varname.size = 2,
+ varname.color = "red", labels.size =8
+ ) + theme_bw() + labs(title = "PCA with UMAP clusters")
+#total PIXL
+pca.pixl <- prcomp(pixl.df[,2:14], retx = T, center = T)
+pixl.biplot <- ggbiplot::ggbiplot(pca.pixl,
+ groups = pixl.df$campaign,
+ circle = T,
+ varname.size = 2,
+ varname.color = "red", labels.size =8
+ ) + theme_bw() + labs(title = "PCA with UMAP clusters")
+
+
+
+
+#plot heatmaps
+umap.centers <- data.frame()
+unq <- unique(umap_clusters)
+for (i in unq) {
+ arr <- colMeans(shared_libs[which(umap_clusters == i),1:8])
+ umap.centers <- rbind(umap.centers, arr)
+}
+names(umap.centers) <- colnames(shared_libs[,1:8])
+
+pixl.centers <- data.frame()
+for (i in unique(shared_pixl$campaign)) {
+ arr <- colMeans(pixl.df[which(shared_pixl$campaign == i), 2:14])
+ pixl.centers <- rbind(pixl.centers, arr)
+}
+names(pixl.centers) <- names(pixl.df)[2:14]
+
+pixl.centers
+## Na20 Mgo Al203 Si02 P205 S03 Cl K20
+## 1 3.473333 7.186667 6.536667 45.96667 0.9166667 1.628889 1.622222 0.9211111
+## 2 1.642857 17.462857 3.188571 29.02286 0.3100000 10.618571 2.134286 0.1414286
+## Cao Ti02 Cr203 Mno FeO-T
+## 1 4.453333 1.0077778 0.1300000 0.4633333 20.98
+## 2 2.702857 0.5771429 0.6442857 0.2757143 22.05
+u.heatmap <- pheatmap(umap.centers, scale = "none", main = "UMAP heatmap")
+
+pxl.heatmap <- pheatmap(pixl.centers, scale = "none", main = "PIXL-Campaign heatmap", labels_row = unique(shared_pixl$campaign))
+
+grid.arrange(pixl.biplot, umap.biplot, ncol = 2)
+
+grid.arrange(as.ggplot(u.heatmap), as.ggplot(pxl.heatmap), ncol = 2)
+
+#look at the pattern differences between high SiO2 concentrations with FeO-T + MgO & Al2O3 + CaO
+According to papers in 1985 and 2010 regarding amelioration +(glacial to inter glacial stages) of lake basins and analysis of river +basins and its trapped sediments, respectively, an index called the +chemical index of alteration (CIA) was used to measure the level of +weathering that rocks underwent as a result of chemical reactions (i.e +reactions with the water and other dissolved substances). Can we use +this index to get accurate information about how silicate-rocks +underwent some lasting form of chemical weathering that is indicative of +the last stages of water on Mars and its effect on basin and delta front +rocks?
+For analysis #2 I will be utilizing the lithography and PIXL +dataset to calculate CAI for our samples considering we know the +definite campaign and sampling location they were at.
The immediate code below is just another way of plotting the +concentration (in this case density) of all our features on different +clusters. This is arbitrary and was just used as another way of +plotting. In my analysis I first plotted all lithology points to find +which of the silicates specified in Dr. Roger’s lecture 2 were present. +I copied these column names so that I could find the relative abundance +of silicates in each sample however, this was just a check for PIXL data +and as of now has no direct meaning other than associating > 0 or +< 0, meaning that silicates with CaO are available. Regardless, for +PIXL and LIBS I found the distributions of CIA values for all samples +and clusters, respectively. I then looked at the difference between +genuine weathering criteria (CIA > 70 indicates some form of +weathering occurred; there are specific ranges that differ in the +literature so I just wanted to see as a whole whether samples > or +< 70 differed). Finally, I plotted a ternary plot of all the values +greater than 70 with the respective cation axis’.
I will be re-using shared_libs and shared_pixl. To find silicate +containing rocks I will be using the lithology.df
#for each cluster facet_wrap the features but color the cluster in
+shared_libs$cluster <- umap_clusters
+shared_libs.long <- shared_libs %>%
+ pivot_longer(cols = names(shared_libs)[1:8], names_to = "Variable", values_to = "Values")
+ggplot(shared_libs.long, aes(x = Values, color = as.factor(cluster))) +
+ geom_density() +
+ facet_wrap(~Variable, scales = "free") +
+ labs(title = "Density plots of all variables")
+
+#lat long map
+sample_coord$name <- "PIXL"
+coords <- rbind(data.frame(name = rep("LIBS", nrow(libs.df)), lat = libs.df$lat, lon = libs.df$lon), sample_coord)
+ggplot(data = coords, aes(x = lat, y = lon, color = name)) +
+ geom_point()
+
+#Lithology heatmap - so note CaO references only the CaO that is only avaliable in silicate rocks. For this since, we aren't exactly sure of the chemical composition of SHERLOC or lithology data, we just check to see if the samples contain silicates (rowSums[which(silicates)] > 0)
+#silicates are found in accordance
+pheatmap(lithology.matrix, scale = "none", main = "Lithology heatmap", labels_row = lithology.df$campaign)
+
+delta.minerals <- c( "Kaolinite", "Hydrated_Mg_Fe_Sulfate", "Fe_Mg_clay", "Mg_sulfate", "Spinels", "Zircon/Baddeleyite", "Chromite", "Ilmenite", "Fe_Mg_carbonate") #Isolated delta minerals; notice the abundance of Mg_Fe minerals
+silicates <- c("quartz", "feldspar", "pyroxene", "olivine", "Fe_Mg_clay")
+rowSums(lithology.matrix[,silicates]) #all samples satisfied
+## [1] 1 1 1 2 2 2 2 4 4 2 2 1 1 1 1 1
+#PIXL
+CIA.pixl <- (pixl.df$Al203 / (pixl.df$Al203 + pixl.df$Na20 + pixl.df$K20 + pixl.df$Cao)) * 100
+hist(CIA.pixl)
+
+#LIBS
+#do these values differ for each cluster?
+CIA.libs <- data.frame()
+for (i in unq) {
+ arr <- shared_libs[which(umap_clusters == i), 1:8]
+ CIA <- cbind((arr$Al2O3 / (arr$Al2O3 + arr$Na2O + arr$K2O + arr$CaO)) * 100, cluster = i, index = as.numeric(rownames(arr)))
+ CIA.libs <- rbind(CIA.libs, CIA)
+}
+names(CIA.libs) <- c("CIA", "cluster","index")
+
+ggplot(data = CIA.libs, aes(x = CIA)) +
+ geom_histogram() +
+ facet_wrap(~as.factor(cluster), scales = "free")
+## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+
+#cluster 1 - 5
+# (1) unimodal peak around 45-50 = virtually no weathering. Higher values > 60 should be looked at more in depth
+# (2) same as (1) but another smaller peak around 55. again no weathering, but higher values should be kept
+# (3) virtually no weathering. I'm wondering if these LIBS samples are benign or what. They have high Ca values and about nothing else
+# (4) Unimodal at 50
+# (5) Left skewed-unimodal around 50, with some higher values
+
+clusters.gt70 <-CIA.libs[which(CIA.libs$CIA >= 70),]
+clusters.gt70.centers <- data.frame()
+for (i in unique(clusters.gt70$cluster)) { #no cluster 3 due to its low values
+ arr <- clusters.gt70[which(clusters.gt70$cluster == i), ]
+ means <- colMeans(shared_libs[arr$index,1:8])
+ clusters.gt70.centers <- rbind(clusters.gt70.centers, means)
+}
+
+names(clusters.gt70.centers) <- names(shared_libs)[1:8]
+gt.70.heatmap <- pheatmap(clusters.gt70.centers, scale = "none", main = "GT60")
+
+clusters.lt70 <- CIA.libs[which(CIA.libs$CIA < 70),]
+clusters.lt70.centers <- data.frame()
+for (i in unique(clusters.lt70$cluster)) { #no cluster 3 due to its low values
+ arr <- clusters.lt70[which(clusters.lt70$cluster == i), ]
+ means <- colMeans(shared_libs[arr$index,1:8])
+ clusters.lt70.centers <- rbind(clusters.lt70.centers, means)
+}
+
+names(clusters.lt70.centers) <- names(shared_libs)[1:8]
+lt.70.heatmap <- pheatmap(clusters.lt70.centers, scale = "none", main = "LT60")
+
+grid.arrange(as.ggplot(gt.70.heatmap), as.ggplot(lt.70.heatmap), ncol = 2)
+
+#really just alumine is in higher quantities. This is an indicator of more potent weathering.
+
+libs_ternary <- shared_libs %>%
+ mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="LIBS", CIA = NA) %>%
+ select(c(13:17))
+
+libs_ternary[clusters.gt70$index, 5] <- "CIA>70"
+libs_ternary[clusters.lt70$index, 5] <- "CIA<70"
+
+
+pixl_ternary <- shared_pixl %>%
+ mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, value="PIXL", CIA = "CIA<70") %>%
+ select(c(11:15))
+
+
+
+ggtern(libs_ternary, ggtern::aes(x = x, y = y, z = z, color = CIA, shape = value)) +
+ geom_point() +
+ geom_point(data = pixl_ternary, aes(x = x, y = y, z = z, color = "PIXL", shape = value)) +
+ theme_rgbw() +
+ labs(x="Si+Al",
+ y="Fe+Mg",
+ z="Ca+Na+K",
+ title = 'LIBS and PIXL ternary clustered by CIA')
+## Warning in geom_point(data = pixl_ternary, aes(x = x, y = y, z = z, color =
+## "PIXL", : Ignoring unknown aesthetics: z
+
+#All PIXL data points are < 70
+
+ggtern(libs_ternary, ggtern::aes(x = x, y= y, z = z, color = as.factor(umap_clusters))) +
+ geom_point() +
+ theme_rgbw() +
+ labs(x="Si+Al",
+ y="Fe+Mg",
+ z="Ca+Na+K",
+ title = "UMAP Clustered on Major Cations")
+
+There isn’t a huge difference in weathering conditions on cation +densities aside from differing Al2O3, but this is the main polyatomic +ion in question when calculating CIA so it makes sense that it would be +the lead indicator of weathering conditions purely based on the formula. +However, while Ca, Na & K also were calculated it points towards the +Fe + Mg densities on how different weathering might be affected by Fe + +Mg concentrations. In all our heatmaps we saw that the relationship +between Fe and Mg as either in unison or slightly different. One quick +question would be, well, if our Alumine values differ what is the +relationship between Fe and Mg (in terms of %), and does this +matter?
Also, quick thought, but the points that are in the middle of the +ternary plot who have relatively high concentrations of all cation +combinations; what does that say about their mineralogy?
Does the abundance of certain minerals provide any insight on +ternary plots of major cations?
+In order to check the differences in mineral abundance, I will be +utilizing the lithology dataframe to see if there are any clear cut +differences in mineral abundance between data points, especially +differing that of campaign type and rock type.
The steps for this analysis are very simple. Im going to find the +sizes (i.e total sum of each sample row of minerals) of each sample to +then compare on ternary graphs of major cations, under studied cations +and then see if I can find some correlation between total wt% of samples +to their abundance of minerals. Finding out that there isn’t (as seen at +the bottom) shows that something else is going on.
lithology.df, shared_libs, shared_pixl
+# Include all data processing code (if necessary), clearly commented
+For this last analysis I just wanted to look at unusual elements +compared to their relative abundance of total minerals. There is no +direct reason for the way I split cation nodes, but I am aware that SO3 +and Cl are assocaited with lake ecosystems and that Ti and Cr203 are +very unreactive based on their wt%.
We don’t see any outstanding results from this analysis but we +can see that abundance is related to total wt%. Our r^2 with regular +pearson coefficient is ~27% while Kendall’s non-parametric test is ~20%. +This is an extreme oversimplification and I believe that comparing total +wt% to certain mineral types might actually prove to be substantial and +might aid in prediction tasks later on. With regard to our ternary +plots, while there aren’t many outstanding results, we see that Roubion +had the largest abundance of minerals (13 in total) but that sedimentary +values (who gravitated towards SO3 & Cl concentrations) actually had +the lowest abundance of total minerals. If we were to create some form +of specialized metric that accounted for certain groups of minerals and +then measured that to total wt% and/or cation groups this might prove +interesting.
#size of points for Lithology is dependent on the abundance of each mineral within the 16 samples ON a ternary plot. Hover over, can we see which
+
+#characterize size of points by their total abundance per sample at first.
+sizes <- rowSums(lithology.matrix)
+
+ggtern(pixl_ternary, ggtern::aes(x = x, y= y, z = z)) +
+ geom_point(aes(size = sizes)) +
+ theme_rgbw() +
+ labs(x="Si+Al",
+ y="Fe+Mg",
+ z="Ca+Na+K",
+ title = "Ternary Plot with major Cations, sized by mineral abundance")
+
+#no real defined pattern but from our last plot and the point that tends inward is just another reason behind the idea that higher wt% likely have larger variety of minerals.
+#im going to alter the Ca+Na+K just to see if there is an effect on other features
+
+
+
+#look at unusual elements
+revised_pixl_ternary <- pixl.df %>%
+ mutate(x=(S03+Cl)/100,y=(P205+Mno)/100,z=(Ti02+Cr203 )/100) %>%
+ select(c(20:22)) %>%
+ mutate(sizes = sizes)
+
+ggtern(revised_pixl_ternary, ggtern::aes(x = x, y= y, z = z)) +
+ geom_point(aes(size = sizes, color = pixl.df$type)) +
+ theme_rgbw() +
+ labs(x="S03+Cl",
+ y="P205+MnO",
+ z="Ti02+Cr203",
+ title = "Ternary Plot of usually undocumented Cations")
+
+hist(rowSums(pixl.df[,2:14]))
+
+hist(sizes)
+
+plot(x = rowSums(pixl.df[,2:14]), y = sizes)
+
+cor.test(x = rowSums(pixl.df[,2:14]), y = sizes, method = 'kendall') #non-parametric cor.test
+## Warning in cor.test.default(x = rowSums(pixl.df[, 2:14]), y = sizes, method =
+## "kendall"): Cannot compute exact p-value with ties
+##
+## Kendall's rank correlation tau
+##
+## data: rowSums(pixl.df[, 2:14]) and sizes
+## z = 0.97775, p-value = 0.3282
+## alternative hypothesis: true tau is not equal to 0
+## sample estimates:
+## tau
+## 0.196399
+The main next step I think is to find these direct correlations +and really dive into the literature on how rock types differ among +minerality and how elemental composition can differ. In addition does +this vary per campaign type, depth (z) within the crater and the +landscape (geomprhology) that surround the rover’s course? Personally, I +think if there is any life it is underground. The surface of Mars is far +too cold to support life now, and its atmosphere is far too counter +intuitive to life. Taking a deep look at the poles might be interesting +as well as there is pre-existing knowledge (https://marsed.asu.edu/mep/ice/polar-caps) that they +might contain water vapor in the atmosphere solidied with carbon +dioxide. That being said given the rovers RIMFAX frequency sensor that +captures underground stratifications I wonder if life could survive +under the ground (Wierzchos et al. 2012)? Radiation is detrimental to +organisms under UVC and UVB wavelengths since the ozone of Mars is not +developed; thus, life might be able to thrive in endolithic colonies +wihtin the fissures of the underground.
Finally, creation of a metric that can help us understand SHERLOC +data a little better would be very beneficial. A prototype metric will +compare the minerals chemical formula to the concentration of minerals +(or just oxides) from PIXL and LIBS datasets. It will find the molecular +weight of each element or compound times (*) however many atoms/ions are +present than compare that mass to the total mass of the chemical +formula. ([mass / total mass] x 100). This value will be compared to the +weight % of the PIXL feature. Summing all these up for every +compound/element and finding the differnce from 1 will be a good +indicator of how much of that mineral is explained by our data. That +being said it will be difficult to account for oxides and individual +elements, as well as for ‘other’ compounds.
see which lat and long correspond to similar values of our +samples. Maybe we will see more. Apply clustering to data. Using AP +clustering find clusters and see the overlap of prior PCA plots +(Specifically campaign) and heatmaps to analyze concentration of +clusters
SHERLOC has been used recently at Cheyava falls. What signs in +both our samples, and in our LIBS data show that chemical reactions +possible for life have occured? #https://www.nasa.gov/missions/mars-2020-perseverance/perseverance-rover/nasas-perseverance-rover-scientists-find-intriguing-mars-rock/
I think retrieving the chemical formulas for pure types of +minerals and comparing the weight % of all different elemental features +and seeing to what amount are these minerals different than their pure +countertype.
Iron and P were found on these spotchy rocks at Cheyava Falls. +Can we track the amount of iron and P, or chemically stimulating mutual +elements that will also provide evidence of life?
what differentiates these between abiotic and biotic (coming from +animals, decomposed matter?)
what is the importance of Sulfur trioxide on river beds?
silicon dioxide?
why are Ferric(is) oxides and Magnesium oxide compatible? What +about their relationship exposes their variance in the same +direction?
clusters both exhibit small proportions of Na2O, TiO2, K2O, Al2O3 +& CaO
SiO2, ferric and magnesium oxide are of interest. SiO2 always +seems to have a specific cluster with a very high amount of +SiO2
can we get access to RIMFAX data? (Wierzchos et +al. 2012)
fissures might show promise of life. endolithic colonies usually +go underground to escape solar radiation (however, lots of UVB and +UVC)
Create a new copy of this notebook in the
-AssignmentX
sub-directory of your team’s github repository
-using the following naming convention
rcsid_assignmentX.Rmd
and
-rcsid_assignmentX.pdf
bennek_assignment03.Rmd
Document all the work you did on your assigned -project this week using the outline below.
You MUST include figures and/or tables to illustrate your work. -Screen shots are okay, but include something!
You MUST include links to other important resources (knitted HTMl -files, Shiny apps). See the guide below for help.
Commit the source (.Rmd
) and knitted
-(.html
) versions of your notebook and push to
-github
Submit a pull request. Please notify -Dr. Erickson if you don’t see your notebook merged within one -day.
DO NOT MERGE YOUR PULL REQUESTS -YOURSELF!!
See the Grading Rubric for guidance on how the contents of this -notebook will be graded on LMS or GradeScope.
-NOTE: Follow an outline format; use bullets to -express individual points.
-RCS ID: Always include this!
Project Name: Always include this!
Summary of work since last week
-NEW: Summary of github issues added and worked
-Summary of github commits
-List of presentations, papers, or other outputs
-List of references (if necessary)
Indicate any use of group shared code base
Indicate which parts of your described work were done by you or -as part of joint efforts
Required: Provide illustrating figures and/or -tables
Provide in natural language a statement of what question you’re -trying to answer
+What can I learn about LIBS target names from the analysts notebook +and other sources?
Provide in natural language a description of the data you are -using for this analysis
-Include a step-by-step description of how you prepare your data -for analysis
-If you’re re-using dataframes prepared in another section, simply -re-state what data you’re using
-# Include all data processing code (if necessary), clearly commented
+I am using the LIBS data, with some features removed. I am mainly +focusing on the metadata. I added a new “type” column and am +categorizing the targets based on a few different metrics.
+#load in LIBS data
+libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
+
+#Drop the standard deviation features, the sum of the percentages,
+#the distance, and the total frequencies
+libs.df <- libs.df %>%
+ select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+ MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+
+# Convert the points to numeric
+libs.df$point <- as.numeric(libs.df$point)
+
+libsrenamed<-cbind(libs.df[,1:4],"type"=0,libs.df[5:13])
+I am adding a label to each scct target, based on descriptions from +the file sent in the webex. Using the following reference table:
+Describe in natural language a statement of the analysis you’re -trying to do
-Provide clearly commented analysis code; include code for tables -and figures!
-# Include all analysis code, clearly commented
-# If not possible, screen shots are acceptable.
-# If your contributions included things that are not done in an R-notebook,
-# (e.g. researching, writing, and coding in Python), you still need to do
-# this status notebook in R. Describe what you did here and put any products
-# that you created in github. If you are writing online documents (e.g. overleaf
-# or google docs), you can include links to the documents in this notebook
-# instead of actual text.
+libsrenamed<-libsrenamed %>%
+ mutate(type = ifelse(grepl("tsrich0404", target,ignore.case=T),
+ "BHVO-2 basalt and K sulfate mixture", type)) %>%
+ mutate(type = ifelse(grepl("LCMB0006", target,ignore.case=T),
+ "Chert", type)) %>%
+ mutate(type = ifelse(grepl("LCA530106", target,ignore.case=T),
+ "Calcite", type)) %>%
+ mutate(type = ifelse(grepl("PMIFS0505", target,ignore.case=T),
+ "Ferrosilite", type)) %>%
+ mutate(type = ifelse(grepl("TAPAG0206", target,ignore.case=T),
+ "Fluoro-Chloro-Hydro Apatite", type)) %>%
+ mutate(type = ifelse(grepl("PMIOR0507", target,ignore.case=T),
+ "Orthoclase", type)) %>%
+ mutate(type = ifelse(grepl("PMIDN0302", target,ignore.case=T),
+ "Diopside", type)) %>%
+ mutate(type = ifelse(grepl("PMIFA0306", target,ignore.case=T),
+ "Olivine", type)) %>%
+ mutate(type = ifelse(grepl("PMIAN0106", target,ignore.case=T),
+ "Andesine", type)) %>%
+ mutate(type = ifelse(grepl("PMIEN0602", target,ignore.case=T),
+ "Enstatite", type)) %>%
+ mutate(type = ifelse(grepl("TSERP0102", target,ignore.case=T),
+ "Serpentine/Talc", type)) %>%
+ mutate(type = ifelse(grepl("LBHVO20406", target,ignore.case=T),
+ "BHVO-2 standard basalt", type)) %>%
+ mutate(type = ifelse(grepl("LJSC10304", target,ignore.case=T),
+ "Mars soil analog", type)) %>%
+ mutate(type = ifelse(grepl("LANKE0101", target,ignore.case=T),
+ "Ankerite", type)) %>%
+ mutate(type = ifelse(grepl("LSIDE0101", target,ignore.case=T),
+ "Siderite", type)) %>%
+ mutate(type = ifelse(grepl("LJMN10106", target,ignore.case=T),
+ "JMN-1 standard Mn nodule", type)) %>%
+ mutate(type = ifelse(grepl("NTE010301", target,ignore.case=T),
+ "Basalt dopped in minor elements - Cu, Zn", type)) %>%
+ mutate(type = ifelse(grepl("NTE020106", target,ignore.case=T),
+ "Basalt dopped in minor elements - Mn, Ba, Cr", type)) %>%
+ mutate(type = ifelse(grepl("NTE030106", target,ignore.case=T),
+ "Basalt dopped in minor elements - Zn", type)) %>%
+ mutate(type = ifelse(grepl("NTE040106", target,ignore.case=T),
+ "Basalt dopped in minor elements - Li, Sr", type)) %>%
+ mutate(type = ifelse(grepl("NTE050301", target,ignore.case=T),
+ "Basalt dopped in minor elements - Ni", type)) %>%
+ mutate(type = ifelse(grepl("SHERG02", target,ignore.case=T),
+ "Shergottite", type)) %>%
+ mutate(type = ifelse(grepl("TITANIUM", target,ignore.case=T),
+ "Titanium", type))
+Based on this +article on nasa’s website, the targets that contain the text “aegis” +were taken by NASA’s new AI that was implemented partway through +perseverance’s path, Autonomous Exploration for Gathering Increased +Science. This allows the rover to take LIBS data autonomously. There are +two types of AEGIS, and I thought about trying to distinguish them +AEGISlite/AEGISheavy, but it seems like the AEGISheavy update doesn’t +change the AEGIS LIBS functionality, it just allows for autonomous +control of other Supercam measurements. As in my previous notebook, I +just have these labeled with “AEGIS”.
+libsrenamed<-libsrenamed %>%
+ mutate(type= ifelse(grepl("aegis", target,ignore.case=T) & type=="0","AEGIS",type))
+When looking into what the suffix “scam” was referring to, (more +specifically than just supercam), I decided to look at the analysts +notebook documentation for sol 448, which was a sol that had a target +name with “scam” in it. I found this chart, which indicates that they +are named this way because a target for another measurement has the same +name, and is in the same area. This could be a good way to potentially +link LIBS and PIXL further.
+I labeled the scam targets with the names of the other measurements +taken at this target and SCAM. Note that the documentation for the +target “montpezat_350_scam” is missing, so my label corresponding it to +PIXL is just a guess. The documentation for the targets +“pollock_knob_501_sca” and “villeplane_scam” are also missing, so their +correlations with ZCAM are also guesses. Here, “AT-SCAM” means “All +Techniques”.
+libsrenamed<-libsrenamed %>%
+ mutate(type= ifelse(grepl("buzzard_rocks_scam", target,ignore.case=T) & type=="0",
+ "PIXL-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("alfalfa_378_scam", target,ignore.case=T) & type=="0",
+ "VISIR-Ramanx2-ZCAM-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("chiniak_565_scam", target,ignore.case=T) & type=="0",
+ "AT-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("garde_210_scam", target,ignore.case=T) & type=="0",
+ "AT-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("guillaumes_168_scam", target,ignore.case=T) & type=="0",
+ "PIXL-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("montpezat_350_scam", target,ignore.case=T) & type=="0",
+ "PIXL-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("naltsos_scam", target,ignore.case=T) & type=="0",
+ "PIXL-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("ouzel_falls_792_scam", target,ignore.case=T) & type=="0",
+ "AT-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("pollock_knob_501_sca", target,ignore.case=T) & type=="0",
+ "ZCAM-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("rose_river_falls_sca", target,ignore.case=T) & type=="0",
+ "?-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("roubion_168_scam", target,ignore.case=T) & type=="0",
+ "ZCAM-PIXL-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("villeplane_scam", target,ignore.case=T) & type=="0",
+ "ZCAM-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("atmo_mountain_637_sc", target,ignore.case=T) & type=="0",
+ "ZCAMMS-SCAM",type)) %>%
+ mutate(type= ifelse(grepl("crosswind_lake_641_s", target,ignore.case=T) & type=="0",
+ "ZCAM-SCAM",type)) %>%
+ mutate(type=ifelse(type=="0","other",type))
+I started going through other targets to see if their names had +importance, many of them have short descriptions but it seems like a lot +of work to do manually right now, and maybe not the best use of time. +Especially as many of the target names seem to be random based on these +descriptions.
+libsrenamed<-libsrenamed %>%
+ mutate(type= ifelse(target=="sei_________________", "other - fine soil",type)) %>%
+ mutate(type= ifelse(target=="naakih______________", "other - coarse soil",type))
Provide in natural language a clear discussion of your -observations.
+For this portion of my notebook, I am not doing specific analysis, +only trying to add more documentation to the data and give meaning to +target names.
+For future analysis, I have some thoughts about how the newly +categorized data should be analysed. Currently, LIBS analysts have been +analysing all of the targets together, maybe taking subsets by cluster, +but I think moving forward we should separate the data.
+It doesn’t make sense to me to analyse the scct data in conjunction +with the rest of the data, these targets are LIBS measurements of +designated samples used to calibrate the machine. This data could be +used for scaling or for understanding what samples are close to a +certain reference, but they are not legitimate mars data, and I don’t +think they should be analysed as such.
+I am unsure whether the AEGIS data should be considered exactly the +same as the rest of the LIBS data, functionally, they are both LIBS +measurements of a rock, which indicates they should be the same. +However, it seems like the AEGIS data is just taken when the rover has +the time and capacity, so that time isn’t wasted and the rover is always +taking measurements, whereas the other LIBS samples are taken with more +intention.
+The SCAM data should definitely be analysed the same way the rest of +the LIBS data is, but it could also be used to link data sets.
Provide in natural language a statement of what question you’re -trying to answer
+How can we distinguish the scct targets on a ternary plot
Provide in natural language a description of the data you are -using for this analysis
-Include a step-by-step description of how you prepare your data -for analysis
-If you’re re-using dataframes prepared in another section, simply -re-state what data you’re using
-# Include all data processing code (if necessary), clearly commented
+I am using the LIBS data prepared earlier, and reformatting it for a +ternary diagram.
+# Include all data processing code (if necessary), clearly commented
+libs.matrix <- as.matrix(libs.df[,6:13])
+
+libs.tern <- as.data.frame(libs.matrix) %>%
+ mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>%
+ select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O,TiO2))
+
+
+libs.tern<-cbind(libs.tern, "type"=libsrenamed$type, "target"=libsrenamed$target,
+ "shape"=libsrenamed$type)
+
+libs.tern<-libs.tern %>% mutate(shape = ifelse(grepl("SCAM", type, ignore.case=T),
+ "other", shape)) %>%
+ mutate(shape = ifelse(grepl("other", type, ignore.case=T),
+ "other", shape)) %>%
+ mutate(shape = ifelse(grepl("scct", target, ignore.case=T), "scct", shape))
+
+libs.tern$shape<-as.factor(libs.tern$shape)
+km<-kmeans(libs.tern[,1:3],5)
+
+libs.tern<-as.data.frame(cbind(libs.tern,"cluster"=as.factor(km$cluster)))
Describe in natural language a statement of the analysis you’re -trying to do
-Provide clearly commented analysis code; include code for tables -and figures!
-# Include all analysis code, clearly commented
-# If not possible, screen shots are acceptable.
-# If your contributions included things that are not done in an R-notebook,
-# (e.g. researching, writing, and coding in Python), you still need to do
-# this status notebook in R. Describe what you did here and put any products
-# that you created in github (documents, jupytor notebooks, etc). If you are writing online documents (e.g. overleaf
-# or google docs), you can include links to the documents in this notebook
-# instead of actual text.
+I am trying to differentiate the scct points from the other points on +a ternart diagram, but I struggled a lot with the best method. I would +like to have them labeled with what kind of rock they are, but labels on +the graph don’t make sense as they block the whole graph and the factor +has too many levels to differentiate by color or shape.
+ggtern(libs.tern, ggtern::aes(x=x,y=y,z=z)) +
+ geom_point(data=subset(libs.tern,shape!="scct"),aes(color=cluster,alpha=0.5)) +
+ theme_rgbw() +
+ labs(title="Mars LIBS ternary Plot",
+ x="Si+Al",
+ y="Fe+Mg",
+ z="Ca+Na+K")+theme(legend.position="bottom") +
+ geom_point(data=subset(libs.tern, shape=="scct"),
+ aes(color="scct",alpha=0.5)) +
+ guides(alpha="none")
+
Provide in natural language a clear discussion of your -observations.
+I don’t have clear results because my graph is not very clear and is +not labeled well. One thing I found interesting was that the points at +the same scct target don’t necessarily have the same value, they are +very close, but not exactly the same, so there is some amount of +error.
Provide in natural language a statement of what question you’re -trying to answer
+I am trying to improve the combined PIXL-LIBS data.
Provide in natural language a description of the data you are -using for this analysis
-Include a step-by-step description of how you prepare your data -for analysis
-If you’re re-using dataframes prepared in another section, -re-state what data you’re using
-# Include all data processing code (if necessary), clearly commented
+I am importing the data from the combined data Rds file in +StudentData
+libs.pixl.combined <- readRDS("~/DAR-Mars-F24/StudentData/PIXL_LIBS_Combined.Rds")
Describe in natural language a statement of the analysis you’re -trying to do
-Provide clearly commented analysis code; include code for tables -and figures!
-# Include all analysis code, clearly commented
-# If not possible, screen shots are acceptable.
-# If your contributions included things that are not done in an R-notebook,
-# (e.g. researching, writing, and coding in Python), you still need to do
-# this status notebook in R. Describe what you did here and put any products
-# that you created in github. If you are writing online documents (e.g. overleaf
-# or google docs), you can include links to the documents in this notebook
-# instead of actual text.
+Some of the matched LIBS and PIXL rows were matched with LIBS values +that were scct targets, this means that our matching method earlier is +not entirely correct. It makes sense that our method would miscategorize +some of these scct targets to match PIXL data, because the rover is +consistently recalibrating by taking LIBS measurements at these scct +targets.
+libs.pixl.combined[991,]
+## sol.x lat lon target point Long sol.y sample name
+## 991 592 18.451 77.40133 scct_ljsc10304______ 19 <NA> NA NA <NA>
+## type campaign location abrasion
+## 991 <NA> <NA> <NA> <NA>
+I am removing the PIXL values that are matched with these scct +targets
+libs.pixl.combined<-libs.pixl.combined%>%
+ mutate(Long=ifelse(grepl("scct", target, ignore.case=T), NA, Long))%>%
+ mutate(sol.y=ifelse(grepl("scct", target, ignore.case=T), NA, sol.y))%>%
+ mutate(sample=ifelse(grepl("scct", target, ignore.case=T), NA, sample))%>%
+ mutate(name=ifelse(grepl("scct", target, ignore.case=T), NA, name))%>%
+ mutate(type=ifelse(grepl("scct", target, ignore.case=T), NA, type))%>%
+ mutate(campaign=ifelse(grepl("scct", target, ignore.case=T), NA, campaign))%>%
+ mutate(location=ifelse(grepl("scct", target, ignore.case=T), NA, location))%>%
+ mutate(abrasion=ifelse(grepl("scct", target, ignore.case=T), NA, abrasion))
+Now, we check which LIBS targets are specifically paired with a PIXL +value, and if we successfully paired them
+libs.pixl.combined<-cbind(libsrenamed$type,libs.pixl.combined)
+
+libs.pixl.combined[grepl("PIXL",libs.pixl.combined$`libsrenamed$type`),7:14]
+## Long sol.y sample name type campaign location abrasion
+## 197 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 198 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 199 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 200 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 201 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 275 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 276 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 277 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 278 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 279 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 280 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 281 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 282 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 283 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 284 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 285 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 286 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 287 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 576 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 577 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 578 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 579 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 580 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 581 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 582 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 583 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 584 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 585 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 711 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 712 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 713 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 714 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 715 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 716 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 717 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 718 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 719 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+## 720 <NA> NA NA <NA> <NA> <NA> <NA> <NA>
+We did not pair these LIBS and PIXL targets, however, these targets +are matched with PIXL targets that we don’t have data for yet, so this +makes sense. The only matched PIXL target we do have data for is the +Roubion target, which doesn’t make sense to match as it is an +atmospheric sample.
Provide in natural language a clear discussion of your -observations.
+Overall, our LIBS and PIXL matching is not perfect, and we need to +continue to narrow it down, but it is somewhat better than I thought it +was based on the two metrics above.
Provide in natural language a clear summary and your proposed -next steps.
+I would like to continue to refine the PIXL and LIBS combined data, +as well as continue learning about the naming conventions of the LIBS +targets and how the LIBS data can be best broken up.