diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.Rmd b/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.Rmd new file mode 100755 index 0000000..5422496 --- /dev/null +++ b/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.Rmd @@ -0,0 +1,692 @@ +--- +title: "Data Analytics Research Individual Final Project Report Mars" +author: "Ashton Compton" +date: "Fall 2024" +output: + pdf_document: + toc: yes + toc_depth: '3' + html_notebook: default + html_document: + toc: yes + toc_depth: 3 + toc_float: yes + number_sections: yes + theme: united +--- + +# DAR Project and Group Members + +* Project name: Mars +* Project team members: + - **Ashton Compton** + - Aadi Lahiri + - CJ Marino + - Nicolas Morawski + - Dante Mwatibo + - Charlotte Peterson + - Doña Roberts + - Margo VanEsselstyn + - David Walczyk + - Xuanting Wang + +## Things to check before you submit (DELETE BEFORE SUBMITTING) ## +* Have you done all the required components of the notebook in the format required? + +* Is your document readable as a research paper even if all the code is suppressed? + + Try suppressing all the code using hint below and see if this is true. +* Did you proofread your document? Does it use complete sentences and good grammar? +* Is every figure/table clearly labeled and titled? +* Does every figure serve a purpose? + + Does the figure/table have a useful title? **Hint:** What _question_ does the figure answer? + + You can put extra (non-essential) figures/tables in your **Appendix**. + + Is the figured/tables captioned? + + Are the figure/tables and its associated findings discussed in the text? + + Is it clear which figure/tables is being discussed? **Hint:** use captions! +* **CRITICAL:** Have you given enough information for someone to reproduce, understand and extend your results? + + Where can they *find* the data and code that you used? + + Have you *described* the data that used? + + Have you *documented* your code? + + Have you stated where code is located? + + Are your figures/tables *clearly labeled*? + + Did you *discuss each figure and your findings*? + + Did you use good grammar and *proofread* your results? + + Finally, have you *committed* your work to github and made a *pull request*? + +* Summarize ALL of your work that is worthy of being preserved in this notebook; Feel free to include work in the appendix at end. It will not be judged as being part of the research document but rather as additional information to be preserved. **if you don't show and/or link to your work here, it doesn't exist for us!** + + +* You **MUST** include figures and/or tables to illustrate your work. *Screen shots or pngs are okay for work generated outside the notebook*. + +* . You **MUST** include links to other important resources (knitted HTMl files, Shiny apps). See the guide below for help. + +5. Commit the source (`.Rmd`), pdf (`.pdf`) and knitted (`.html`) versions of your notebook and push to github. Turn in the pdf version to lms. + + +See LMS for guidance on how the contents of this notebook will be graded. + +**DELETE THE SECTIONS ABOVE!** + + +# 0.0 Preliminaries. + +*R Notebooks are meant to be dynamic documents. Provide any relevant technical guidance for users of your notebook. Also take care of any preliminaries, such as required packages. Sample text:* + +This report is generated from an R Markdown file that includes all the R code necessary to produce the results described and embedded in the report. Code blocks can be surpressed from output for readability using the command code `{R, echo=show}` in the code block header. If `show <- FALSE` the code block will be surpressed; if `show <- TRUE` then the code will be show. + +```{r} +# Set to TRUE to expand R code blocks; set to FALSE to collapse R code blocks +show <- TRUE +``` + + +Executing this R notebook requires some subset of the following packages: + +* `pandoc` +* `rmarkdown` +* `tidyverse` +* `stringr` +* `ggbiplot` +* `pheatmap` +* `knitr` +* `paletteer` +* `plotly` +* `GGally` + +These will be installed and loaded as necessary (code suppressed). + + +```{r, include=FALSE} +# This code will install required packages if they are not already installed +# ALWAYS INSTALL YOUR PACKAGES LIKE THIS! +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} + +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} + +if (!require("knitr")) { + install.packages("knitr") + library(knitr) +} + +if (!require("paletteer")) { + install.packages("paletteer") + library(paletteer) +} + +if (!require("plotly")) { + install.packages("plotly") + library(plotly) +} + +if (!require("GGally")) { + install.packages("GGally") + library(GGally) +} + +if(!require("ggtern")){ + install.packages("ggtern") + library(ggtern) +} +``` + +# 1.0 Project Introduction + +_Describe your project and your approaches at a high level. Give enough information that a researcher examing your notebook can understand what this notebook is about. _ + +The team had access to data from the first 16 samples from the Mars Perseverance rover. Each sample was assigned a campaign, either Crater Floor or Delta Front. This paper is focused on the task of finding differences between the two campaigns in the data. Selection in data combined with data visualization with graphs produced good results for finding significant differences between the campaigns. + +# 2.0 Organization of Report + +_Give report organization including list of major findings. Sample is provided. Please be sure to edit appropriately and remove this statement._ + +This report is organize as follows: + +* Section 3.0. Finding 1: Igneous and Sedimentary Rock Mixes across Campaigns. Igneous and Sedimentary rock types appear as a mix across the two campaigns Delta Front and Crater Floor + +* Section 4.0. Finding 2: PIXL should not be log scaled for machine learning efforts + +* Section 5.0. Finding 3: Rock type trends from Igneous to Sedimentary as Silicon + Aluminum cation composition decreases + +* Section 6.0. Finding 4: Preparation of results for Mars Mission Minder App + + +# 3.0 Finding 1: Sedimentary and Igneous Mixes across Campaigns + +Originally, data from Nasa implied that samples from Crater Floor were only igneous and samples from Delta Front were only sedimentary, however this was found to be false. Within Delta Front there are signs of igneous rock mixed with sedimentary rock. This is evident from mineral distributions within Delta Front samples. Likewise, sedimentary rock is present in Crater Floor. + +## 3.1 Data, Code, and Resources + +1. v1_sample_meta.Rds contains Meta Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds) + +2. v1_lithology.Rds contains Lithology Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds) + +3. v1_pixl.Rds contains PIXL Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds) + +4. v1_libs.Rds contains LIBS data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds) + + +The data is read in and stored in dataframes. The first (atmospheric) sample is removed from each dataset, since it is flawed. A campaign column is added to each dataframe. + +```{r, include=FALSE} +# Code +#Load in data +### +# Load the saved lithology data with locations added +#lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds") +meta.df<- readRDS("~/DAR-Mars-F24/StudentData/v1_sample_meta.Rds") + +#Select 2:16 +meta.df <- meta.df[2:16,] + +#Read in v1 lithology +lithology.df<- readRDS("~/DAR-Mars-F24/StudentData/v1_lithology.Rds") + +# Cast samples as numbers +lithology.df$Sample <- as.numeric(lithology.df$Sample) + +# Convert rest into factors +lithology.df[sapply(lithology.df, is.character)] <- + lapply(lithology.df[sapply(lithology.df, is.character)], + as.factor) + +# Keep only first 16 samples because the data for the rest of the samples is not available yet +#Also i'm getting rid of the atmospheric sample for now +lithology.df<-lithology.df[2:16,] + +lithology.df$campaign <- meta.df$Campaign +### +#Used for map +pixl_pos.df<- meta.df %>% select(Lat, Lon) + +# Load the saved PIXL data with locations added +pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds") + +# Convert to factors +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) + +#Get rid of atmospheric sample +pixl.df <- pixl.df[2:16,] + +#Make pixl matrix +pixl.matrix <- as.matrix(pixl.df) + +#Add campaign +pixl.df$campaign <- meta.df$Campaign +# # Make the matrix of just mineral percentage measurements +# pixl.matrix <- pixl.df[,2:14] + +#Do Libs +libs.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs_to_sample.Rds") +### +# Read in data as provided. +sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds") + +# Clean up data types +sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral) +sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], + as.numeric) +# Transform NA's to 0 +sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0) + +# Reformat data so that rows are "abrasions" and columns list the presence of minerals. +# Do this by "pivoting" to a long format, and then back to the desired wide format. + +sherloc_long <- sherloc_abrasion_raw %>% + pivot_longer(!Mineral, names_to = "Name", values_to = "Presence") + +# Make abrasion a factor +sherloc_long$Name <- as.factor(sherloc_long$Name) + +# Make it a matrix +sherloc.matrix <- sherloc_long %>% + pivot_wider(names_from = Mineral, values_from = Presence) + +#Remove atmospheric sample +sherloc.matrix <- sherloc.matrix[2:16,] + +# Get sample information from PIXL and add to measurements -- assumes order is the same + +sherloc.df <- cbind(pixl.df[,c("Sample")],sherloc.matrix) + +# Measurements are everything except first column +sherloc.matrix<-as.matrix(sherloc.matrix[,-1]) + +### +#Add in wss plot for elbow method clustering +wssplot <- function(data, nc = 15, seed =10, title="Quality of k-means by Cluster") { + wss <- data.frame(cluster=1:nc, quality=c(0)) + for (i in 1:nc){ + set.seed(seed) + wss[i,2] <- kmeans(data, centers=i)$tot.withinss} + ggplot(data=wss,aes(x=cluster,y=quality)) + + geom_line() + + ggtitle(title) +} + +seed <- 14 +set.seed +``` + +For this finding, Lithology data is grouped by campaign. The number of times a mineral occurs across all samples is counted and a plot is generated. + +```{r} +# Include all data processing code (if necessary), clearly commented +#Start with lithology +#Group by campaign & remove metadata +lithology.df.sorted <- lithology.df %>% group_by(campaign) %>% select(-c(Sample)) + +#Turn into long form and only keep positive cases +lithology.df.sorted <- lithology.df.sorted %>% pivot_longer(2:ncol(lithology.df.sorted)-1,names_to = "Feature", values_to="Factor") %>% filter(Factor == 1) + +#Count # of identical cases +lithology.df.sorted <- lithology.df.sorted %>% count(Feature) + +#Sort, Crater Floor is High to low & Delta Front is added back in low to high +lithology.df.sorted <- lithology.df.sorted %>% filter(campaign == "Crater Floor") %>% arrange(desc(n)) %>% ungroup() %>% add_row(lithology.df.sorted %>% filter(campaign == "Delta Front") %>% arrange(n)) + +#Make plot for lithology +sherlocPlot <- ggplot(lithology.df.sorted, aes(x=factor(Feature, levels = (Feature %>% unique())), y = n, fill = campaign)) + + geom_col(position=position_dodge(preserve="total"), width=0.6) + + theme(panel.grid.major.x=element_blank(), axis.text.x = element_text(angle = 60, vjust = 1.0, hjust=1, size = 12)) + + labs(x="", y="Count") + + ggtitle("SHERLOC Dataset, Total Mineral Occurances, Samples grouped by Campaign") + + scale_fill_manual(values=c('#d6001c','#54585a')) +``` + +## 3.2 Contribution + + +Data preparation started from assignment 1. Doña Roberts instructed on how to include updated student data (v1's). Otherwise solo work. + +## 3.3 Methods Description + +A goal was to produce graphs to visualize the data so that differences between campaigns could be spotted. The first way to achieve this was using the Lithology dataset. The team decided to use SHERLOC instead of Lithology moving forward, considering they represent the same data only one is numeric while the other is binary. Lithology was used anyway to produce a graph for this analysis for the sake of simplicity. + +Method 3.3.1 +Lithology was cleaned of unnecessary metadata first. Then, points were grouped by their campaign. The total occurrence of each mineral measured in Lithology was determined. Lastly, these totals were graphed using the grouped data. + +## 3.4 Result and Discussion + +It can be determined that sedimentary and igneous minerals appear across both campaigns. There are multiple igneous minerals that appear in samples from Delta Front. This can be seen in the following graph (Method 3.3.1) + + +Title: Total Mineral Occurrence with Samples grouped by Campaign +```{r, result02_data} +sherlocPlot +``` + +Description: Represents the SHERLOC dataset. Lithology points were grouped by their campaign. The number of times a mineral occurred across all samples was counted and totaled. These totals were plotted above using the grouped data. This graph makes it easy to see differences between the two campaigns. Notes on this are below. +Igneous minerals counted in Delta Front were Chromite, Ilmenite, Spinels, and Zircon. In Crater Floor, Hydrated Calcium Sulfate & Hydrated Sulfates are sedimentary minerals found in evaporated environments. + +## 3.5 Conclusions, Limitations, and Future Work. + +Campaigns Delta Front and Crater Floor do not exlusively contain Sedimentary and Igneous minerals. They each contain a mix of both types, especially Delta Front. + +Limitations to this analysis are a lack of data points and communication with a geologist. + +Future work includes looking into the details of igneous and sedimentary rock distributions across new campaigns and more general environments like latitude and longitude. + + +# 4.0 Finding 2: Pixl should not be log scaled + +## 4.1 Data, Code, and Resources + +1. v1_sample_meta.Rds contains Meta Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds) + +2. v1_pixl.Rds contains PIXL Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds) + +A new dataframe is made by applying log10 to the entire PIXL dataframe. Several plots are prepared for presentation. + +```{r} +#First replace 0.0 entries with 0.00001 so they don't scale to inf +pixl.matrix[pixl.matrix == 0] <- 0.00001 +#Apply log10 to every entry in pixl.matrix & get new scaled df +pixl.scaled <- log10(pixl.matrix) + +# Include all data processing code (if necessary), clearly commented +#First replace 0.0 entries with 0.00001 so they don't scale to inf +pixl.matrix[pixl.matrix == 0] <- 0.00001 +#Apply log10 to every entry in pixl.matrix & get new scaled df +pixl.scaled <- log10(pixl.matrix) + +#Create an elbow plot for both pixl.matrix & pixl.scaled +# wssplot(pixl.matrix, nc=8, seed=14, 'Unscaled') +# wssplot(pixl.scaled, nc=8, seed=14, "Scaled") +#Comment Out to suppress for report + +#Do kmeans for both matrices +unscaled.kmeans <- kmeans(pixl.matrix, 3) +scaled.kmeans <- kmeans(pixl.scaled, 3) + +#Do pca for both matrices +unscaled.pca <- prcomp(pixl.matrix) +scaled.pca <- prcomp(pixl.scaled) + +#Make biplots +unscaled.plot <- ggbiplot::ggbiplot(unscaled.pca, + labels = pixl.df$type, + groups = as.factor(unscaled.kmeans$cluster)) + + ggtitle("Unscaled Pixl") + +scaled.plot <- ggbiplot::ggbiplot(scaled.pca, + labels = pixl.df$type, + groups = as.factor(scaled.kmeans$cluster)) + + ggtitle("Scaled Pixl") + + +#pheatmap(pixl.scaled, scale="none") +``` + +## 4.2 Contribution + +Solo + +## 4.3 Methods Description + +A goal was to scale the PIXL dataframe for better results performing machine learning with the PIXL dataframe. The entire PIXL dataframe was scaled using the log10 function, which applies log10 to each entry in the dataframe. + +4.3.1 +Produce Elbow Plots to determine the best number of clusters for clustering. It was determined three clusters was the best for both dataframes. + +4.3.2 +Do kmeans on both scaled and non-scaled PIXL to cluster both dataframes. Then produce pheatmaps to represent both dataframes with the points grouped by cluster. + +4.3.3 +Do PCA on both scaled and non-scaled PIXL. Then produce biplots to visualize the PCAs. + +## 4.4 Result and Discussion + +Clustering wasn't clearly better in the scaled PIXL dataframe. See pheatmaps below (Method 4.3.2) + +PHeatmaps for Scaled and Non-scaled PIXL Dataframes +```{r} +#Produce heatmaps for both +pheatmap(unscaled.kmeans$centers, scale="none", main="Unscaled Pixl") +pheatmap(scaled.kmeans$centers, scale="none", main="Scaled Pixl") +``` + +Description: PHeatmaps of both Non-scaled and scaled PIXL Dataframes. Color corresponds to the variation of each point from the average value within the dataframe. PIXL samples are combined together by their corresponding cluster number. + +Next PCA results were considered, and log scaling wasn't clearly better here either. See biplots below (Method 4.3.3) + +Biplots for Scaled and Non-scaled PIXL Dataframes. +```{r} +unscaled.plot +scaled.plot +``` + +Description: PCA can represent data with multiple features/dimensions using only two dimensions. The above plots show the general distributions of the features of the two PIXL dataframes. The black arrows represent increasing amounts of the labelled compound. + +## 4.5 Conclusions and Future Work. + +It was found the scaled PIXL dataframe did not produce better results than the original PIXL dataframe. It was concluded log scaling PIXL doesn't yield better results. + +Another attempt at scaling PIXL was done by Aadi, trying to use earth reference data to scale PIXL. + +At the moment, no other ideas for scaling pixl have come up. + +# 5.0 Finding 3: Gradient from Igneous to Sedimentary Rock + +## 5.1 Data, Code, and Resources + +1. v1_sample_meta.Rds contains Meta Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds) + +2. v1_pixl.Rds contains PIXL Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds) + +3. v1_libs.Rds contains LIBS data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds) + + + +```{r} +#take the sums of the specific elements +pixl_ternary <- pixl.df %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>% + drop_na() + +#This is for the labels on the Ternary Plot below +pixl_ternary <- cbind(pixl_ternary, Sample_display= + c("2","3","4,6,7","5,8,9","","","","", + "10,11","","12,13","","14,15","","16")) + +# Load the saved LIBS data with locations added +libs.df1 <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds") + +libs.df1$Point <- as.numeric(libs.df1$Point) + +#suppressing warnings here because target and point do not get a mean calculated, +#but thats is fine as we have the target anyways and point is no longer relevant +suppressWarnings( + libs.uniquetar <- + aggregate(libs.df1, list(Target = libs.df1$Target), mean)) + +#drop target and point from the data frame +libs.uniquetar <- libs.uniquetar %>% select(!c(Target,Point)) + +#Sum the elements we are looking at +libs.df1 <- libs.df1 %>% + mutate(y = (FeOT + MgO) / 100, z = (CaO+Na2O+K2O) / 100, x = (SiO2 + Al2O3) / 100) + +#Same thing but aggregate +libs.uniquetar <- libs.uniquetar %>% + mutate(y = (FeOT + MgO) / 100, z = (CaO+Na2O+K2O) / 100, x = (SiO2 + Al2O3) / 100) + +libs_ternplot <- libs.df1 %>% select(c(x,y,z)) +libs_ternplot2 <- libs.uniquetar %>% select(c(x,y,z)) + +set.seed(1234) + +#kmeans on the original data +tern.km <- kmeans(libs_ternplot, centers=4) + +libs_ternplot <- cbind(libs_ternplot, cluster=as.factor(tern.km$cluster)) + +#kmeans on the aggregate data +tern.km2 <- kmeans(as.matrix(libs_ternplot2), centers=4) + +libs_ternplot2 <- cbind(libs_ternplot2, cluster=as.factor(tern.km2$cluster)) + +#ternary plot for LIBS data +ternPlot <- ggtern() + + #color by cluster + geom_point(data=libs_ternplot, aes(x=x,y=y,z=z, colour = cluster), alpha = 0.5) + + scale_colour_manual(values=c('#d6001c','#54585a','#9ea2a2','#000')) + + labs(title="Sample Cation Compositions", + subtitle="LIBS data Clustered by Cation Group with PIXL samples by Campaign", + x="Si+Al2", + y="Fe+Mg", + z="Ca+Na2+K2") + + #Add pixl + geom_point(data=pixl_ternary, aes(x=x,y=y,z=z, shape=campaign), colour='green', size=3) + + # #Add labels to PIXL data corresponding to sample number + geom_text(data=pixl_ternary, + ggtern::aes(x=x, y=y, z=z, label=Sample_display, + hjust = ifelse(x > 0.43, 1, -0.1), # Horizontal adjust to avoid overlap + vjust = ifelse(x == 0.3668, 1.3, + ifelse(x == 0.375, 1, ifelse(x > 0.43, 1.5, -0.3))), + fontface="bold"), + size=3, colour='green') + + theme_bw() +``` + +## 5.2 Contribution + +Great credit to Aadi Lahiri & Nicolas Morawski, the original creators of ternary plot used to derive the main the finding of this section + +The author modified original plot to suit the campaign section of the Mars Mission Minder App. + +## 5.3 Methods Description + +The cation compositions for each point in LIBS and each sample in PIXL were calculated. Then a ternary plot was produced with both the LIBS points, grouped by cluster, and the PIXL samples grouped by campaign. + +## 4.4 Result and Discussion + +Following the direction of decreasing Silicon and Aluminum, PIXL samples transition from broadly Igneous to Sedimentary rock. This can be seen as the campaign transitions from Crater Floor to Delta Front, which both largely correlate to Igneous and Sedimentary rock samples. + +Cation Compositions of LIBS Points and PIXL Samples, grouped by Cluster and Campaign, Respectively +```{r} +ternPlot +``` + +Description: LIBS points were clustered using kmeans and PIXL samples were grouped by campaign. The cation compositions of all involved points were calculated and plotted above, with the data grouping indicated by legends. + +## 5.5 Conclusions and Future Work. + +There exists a transformation within PIXL data from Igneous to Sedimentary rock in the direction of decreasing Silicon and Aluminum presence with rock samples. + +Research into the significance of Silicon + Aluminum cation composition in geological environments. + +Look for a similar trend in the LIBS points. This could be done using the updated LIBS datasets. + +# 6.0 Finding 4: Campaign Focused App Contribution + +## 6.1 Data, Code, and Resources + +1. v1_sample_meta.Rds contains Meta Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds) + +2. v1_lithology.Rds contains Lithology Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds) + +3. v1_pixl.Rds contains PIXL Data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds) + +4. v1_libs.Rds contains LIBS data +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds) + +```{r} +#Make box plot for PIXL +pixl.lf <- pixl.df %>% select(-c(Sample)) %>% pivot_longer(1:13) +colnames(pixl.lf)<- c("campaign", "feature", "value") +pixlPlot <- ggplot(data = pixl.lf, aes(x=factor(feature, levels = (feature %>% unique())), y=value, color = campaign)) + + geom_boxplot() + + scale_y_log10() + + ggtitle("PIXL, Compound Distribution by Campaign") + + labs(x="", y="log10 scale from percent composition") + + scale_colour_manual(values=c('#d6001c','#54585a')) + +#Make box plot for LIBS +libs.lf <- libs.df %>% filter(Distance <= 7) %>% select(8:18) %>% select(-c("Lat.pixl","Lon.pixl")) + +colnames(libs.lf)[2:9] <- colnames(pixl.df)[2:9] +colnames(libs.lf)[1] <- "campaign" + +libs.lf <- libs.lf %>% pivot_longer(2:9) +colnames(libs.lf)<- c("campaign", "feature", "value") +libsPlot <- ggplot(data = libs.lf, aes(x=factor(feature, levels = (feature %>% unique())), y=value, color = campaign)) + + geom_boxplot() + + scale_y_log10() + + ggtitle("LIBS, Compound Distribution by Campaign") + + labs(x="", y="log10 scale from percent composition") + + scale_colour_manual(values=c('#d6001c','#54585a')) +``` + +## 6.2 Contribution + +Solo mostly, includes updated student data and ternary plot by Aadi and Nicholas + +## 6.3 Methods Description + +Prepared a tab to be included in the final Mars Mission Minder 2D App. Created a series of relavant graphs for the general campaign comparison question. The first graph is the one featured in Finding 1. Another two graphs feature PIXL and LIBS. For the PIXL graph, samples were grouped by campaign. The distribution of each compound measured in PIXL was then plotted. Likewise, for the LIBS plot, points within 7 meters of a sample were selected and then grouped by their campaign. The distribution of each compound measured in LIBS was the plotted with the filtered and grouped data. Lastly, the ternary plot from the finding 3 was included. + +## 6.4 Result and Discussion + +Plot Representing SHERLOC in Campaign Analysis +Total Mineral Occurrence with Samples grouped by Campaign +```{r} +sherlocPlot +``` + +Description: From the SHERLOC dataset, samples were grouped by campaign. Then the total count of each mineral occurrence was graphed using the grouped data. + +Plot Representing PIXL in Campaign Analysis +Sample Compound Distributions, Samples grouped by Campaign +```{r} +pixlPlot +``` + +Description: From PIXL dataset, samples were grouped by campaign. Then the distribution of each compound measured in PIXL was graphed using the grouped data. + +Plot Representing LIBS in Campaign Analysis +Sample Compound Distributions, Samples grouped by Campaign +```{r} +libsPlot +``` + +Description: LIBS points within 7 meters of a sample were selected and then grouped by their campaign. The distributions of each compound measured in LIBS were plotted using the filtered and grouped data. + +Plot Representing Cation Compositions +Cation Compositions of LIBS and PIXL points +```{r} +ternPlot +``` + +Description: Cation compositions of each point in LIBS and each sample in PIXL were calculated, then plotted with LIBS points grouped by clusters and PIXL samples grouped by campaign. +. +## 6.5 Conclusions and Future Work. + +Plots representing the difference between campaigns were successfully gathered together to create a final tab for the Mars Mission Minder App. + +Future work would include a combined version with Evangaline's (Xuantang) campaign comparison work. + +# Bibliography + +* `pandoc` +* `rmarkdown` +* `tidyverse` +* `stringr` +* `ggbiplot` +* `pheatmap` +* `knitr` +* `paletteer` +* `plotly` +* `GGally` + + +# Appendix + +5.0 Finding 3 was investigating if log scaling PIXL produces better results. The two plots below were generated as a part of this analysis. From these, it was concluded three clusters should be made for PIXL and the scaled PIXL dataframe. + +Title: Elbow Plots for unscaled and scaled versions of PIXL +```{r} +#Create an elbow plot for both pixl.matrix & pixl.scaled +wssplot(pixl.matrix, nc=8, seed=14, 'Unscaled') +wssplot(pixl.scaled, nc=8, seed=14, "Scaled") +``` + +Description: Elbow plots show the quality of clustering compared to the number of clusters created for a dataframe. These represent the unscaled and scaled PIXL dataframes respectively. Note the seed was set to 14 for this analysis, which is important for reproducing the same results. \ No newline at end of file diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.html b/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.html new file mode 100644 index 0000000..f072fcf --- /dev/null +++ b/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.html @@ -0,0 +1,2190 @@ + + + + + + + + + + + + + + +Data Analytics Research Individual Final Project Report Mars + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

1 DAR Project and Group +Members

+
    +
  • Project name: Mars
  • +
  • Project team members: +
      +
    • Ashton Compton
    • +
    • Aadi Lahiri
    • +
    • CJ Marino
    • +
    • Nicolas Morawski
    • +
    • Dante Mwatibo
    • +
    • Charlotte Peterson
    • +
    • Doña Roberts
    • +
    • Margo VanEsselstyn
    • +
    • David Walczyk
    • +
    • Xuanting Wang
    • +
  • +
+
+

1.1 Things to check +before you submit (DELETE BEFORE SUBMITTING)

+
    +
  • Have you done all the required components of the notebook in the +format required?

  • +
  • Is your document readable as a research paper even if all the +code is suppressed?

    +
      +
    • Try suppressing all the code using hint below and see if this is +true.
    • +
  • +
  • Did you proofread your document? Does it use complete sentences +and good grammar?

  • +
  • Is every figure/table clearly labeled and titled?

  • +
  • Does every figure serve a purpose?

    +
      +
    • Does the figure/table have a useful title? Hint: +What question does the figure answer?
    • +
    • You can put extra (non-essential) figures/tables in your +Appendix.
    • +
    • Is the figured/tables captioned?
    • +
    • Are the figure/tables and its associated findings discussed in the +text?
    • +
    • Is it clear which figure/tables is being discussed? +Hint: use captions!
    • +
  • +
  • CRITICAL: Have you given enough information for +someone to reproduce, understand and extend your results?

    +
      +
    • Where can they find the data and code that you used?
    • +
    • Have you described the data that used?
    • +
    • Have you documented your code?
    • +
    • Have you stated where code is located?
    • +
    • Are your figures/tables clearly labeled?
    • +
    • Did you discuss each figure and your findings?
    • +
    • Did you use good grammar and proofread your results?
    • +
    • Finally, have you committed your work to github and made a +pull request?
    • +
  • +
  • Summarize ALL of your work that is worthy of being preserved in +this notebook; Feel free to include work in the appendix at end. It will +not be judged as being part of the research document but rather as +additional information to be preserved. if you don’t show and/or +link to your work here, it doesn’t exist for us!

  • +
  • You MUST include figures and/or tables to +illustrate your work. Screen shots or pngs are okay for work +generated outside the notebook.

  • +
  • . You MUST include links to other important +resources (knitted HTMl files, Shiny apps). See the guide below for +help.

  • +
+
    +
  1. Commit the source (.Rmd), pdf (.pdf) and +knitted (.html) versions of your notebook and push to +github. Turn in the pdf version to lms.
  2. +
+

See LMS for guidance on how the contents of this notebook will be +graded.

+

DELETE THE SECTIONS ABOVE!

+
+
+
+

2 0.0 Preliminaries.

+

R Notebooks are meant to be dynamic documents. Provide any +relevant technical guidance for users of your notebook. Also take care +of any preliminaries, such as required packages. Sample text:

+

This report is generated from an R Markdown file that includes all +the R code necessary to produce the results described and embedded in +the report. Code blocks can be surpressed from output for readability +using the command code {R, echo=show} in the code block +header. If show <- FALSE the code block will be +surpressed; if show <- TRUE then the code will be +show.

+
# Set to TRUE to expand R code blocks; set to FALSE to collapse R code blocks 
+show <- TRUE
+ +

Executing this R notebook requires some subset of the following +packages:

+
    +
  • pandoc
  • +
  • rmarkdown
  • +
  • tidyverse
  • +
  • stringr
  • +
  • ggbiplot
  • +
  • pheatmap
  • +
  • knitr
  • +
  • paletteer
  • +
  • plotly
  • +
  • GGally
  • +
+

These will be installed and loaded as necessary (code +suppressed).

+ +
+
+

3 1.0 Project +Introduction

+

Describe your project and your approaches at a high level. Give +enough information that a researcher examing your notebook can +understand what this notebook is about.

+

The team had access to data from the first 16 samples from the Mars +Perseverance rover. Each sample was assigned a campaign, either Crater +Floor or Delta Front. This paper is focused on the task of finding +differences between the two campaigns in the data. Selection in data +combined with data visualization with graphs produced good results for +finding significant differences between the campaigns.

+
+
+

4 2.0 Organization of +Report

+

Give report organization including list of major findings. Sample +is provided. Please be sure to edit appropriately and remove this +statement.

+

This report is organize as follows:

+
    +
  • Section 3.0. Finding 1: Igneous and Sedimentary Rock Mixes across +Campaigns. Igneous and Sedimentary rock types appear as a mix across the +two campaigns Delta Front and Crater Floor

  • +
  • Section 4.0. Finding 2: PIXL should not be log scaled for machine +learning efforts

  • +
  • Section 5.0. Finding 3: Rock type trends from Igneous to +Sedimentary as Silicon + Aluminum cation composition decreases

  • +
  • Section 6.0. Finding 4: Preparation of results for Mars Mission +Minder App

  • +
+
+
+

5 3.0 Finding 1: +Sedimentary and Igneous Mixes across Campaigns

+

Originally, data from Nasa implied that samples from Crater Floor +were only igneous and samples from Delta Front were only sedimentary, +however this was found to be false. Within Delta Front there are signs +of igneous rock mixed with sedimentary rock. This is evident from +mineral distributions within Delta Front samples. Likewise, sedimentary +rock is present in Crater Floor.

+
+

5.1 3.1 Data, Code, and +Resources

+
    +
  1. v1_sample_meta.Rds contains Meta Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds

  2. +
  3. v1_lithology.Rds contains Lithology Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds

  4. +
  5. v1_pixl.Rds contains PIXL Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds

  6. +
  7. v1_libs.Rds contains LIBS data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds

  8. +
+

The data is read in and stored in dataframes. The first (atmospheric) +sample is removed from each dataset, since it is flawed. A campaign +column is added to each dataframe.

+

For this finding, Lithology data is grouped by campaign. The number +of times a mineral occurs across all samples is counted and a plot is +generated.

+
# Include all data processing code (if necessary), clearly commented
+#Start with lithology
+#Group by campaign & remove metadata
+lithology.df.sorted <- lithology.df %>% group_by(campaign) %>% select(-c(Sample))
+
+#Turn into long form and only keep positive cases
+lithology.df.sorted <- lithology.df.sorted %>% pivot_longer(2:ncol(lithology.df.sorted)-1,names_to = "Feature", values_to="Factor") %>% filter(Factor == 1)
+
+#Count # of identical cases
+lithology.df.sorted <- lithology.df.sorted %>% count(Feature)
+
+#Sort, Crater Floor is High to low & Delta Front is added back in low to high
+lithology.df.sorted <- lithology.df.sorted %>% filter(campaign == "Crater Floor") %>% arrange(desc(n)) %>% ungroup() %>% add_row(lithology.df.sorted %>% filter(campaign == "Delta Front") %>% arrange(n))
+
+#Make plot for lithology
+sherlocPlot <- ggplot(lithology.df.sorted, aes(x=factor(Feature, levels = (Feature %>% unique())), y = n, fill = campaign)) + 
+  geom_col(position=position_dodge(preserve="total"), width=0.6) +
+  theme(panel.grid.major.x=element_blank(), axis.text.x = element_text(angle = 60, vjust = 1.0, hjust=1, size = 12)) +
+  labs(x="", y="Count") +
+  ggtitle("SHERLOC Dataset, Total Mineral Occurances, Samples grouped by Campaign") +
+  scale_fill_manual(values=c('#d6001c','#54585a'))
+
+
+

5.2 3.2 Contribution

+

Data preparation started from assignment 1. Doña Roberts instructed +on how to include updated student data (v1’s). Otherwise solo work.

+
+
+

5.3 3.3 Methods +Description

+

A goal was to produce graphs to visualize the data so that +differences between campaigns could be spotted. The first way to achieve +this was using the Lithology dataset. The team decided to use SHERLOC +instead of Lithology moving forward, considering they represent the same +data only one is numeric while the other is binary. Lithology was used +anyway to produce a graph for this analysis for the sake of +simplicity.

+

Method 3.3.1 Lithology was cleaned of unnecessary metadata first. +Then, points were grouped by their campaign. The total occurrence of +each mineral measured in Lithology was determined. Lastly, these totals +were graphed using the grouped data.

+
+
+

5.4 3.4 Result and +Discussion

+

It can be determined that sedimentary and igneous minerals appear +across both campaigns. There are multiple igneous minerals that appear +in samples from Delta Front. This can be seen in the following graph +(Method 3.3.1)

+

Title: Total Mineral Occurrence with Samples grouped by Campaign

+
sherlocPlot
+

+

Description: Represents the SHERLOC dataset. Lithology points were +grouped by their campaign. The number of times a mineral occurred across +all samples was counted and totaled. These totals were plotted above +using the grouped data. This graph makes it easy to see differences +between the two campaigns. Notes on this are below. Igneous minerals +counted in Delta Front were Chromite, Ilmenite, Spinels, and Zircon. In +Crater Floor, Hydrated Calcium Sulfate & Hydrated Sulfates are +sedimentary minerals found in evaporated environments.

+
+
+

5.5 3.5 Conclusions, +Limitations, and Future Work.

+

Campaigns Delta Front and Crater Floor do not exlusively contain +Sedimentary and Igneous minerals. They each contain a mix of both types, +especially Delta Front.

+

Limitations to this analysis are a lack of data points and +communication with a geologist.

+

Future work includes looking into the details of igneous and +sedimentary rock distributions across new campaigns and more general +environments like latitude and longitude.

+
+
+
+

6 4.0 Finding 2: Pixl +should not be log scaled

+
+

6.1 4.1 Data, Code, and +Resources

+
    +
  1. v1_sample_meta.Rds contains Meta Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds

  2. +
  3. v1_pixl.Rds contains PIXL Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds

  4. +
+

A new dataframe is made by applying log10 to the entire PIXL +dataframe. Several plots are prepared for presentation.

+
#First replace 0.0 entries with 0.00001 so they don't scale to inf
+pixl.matrix[pixl.matrix == 0] <- 0.00001
+#Apply log10 to every entry in pixl.matrix & get new scaled df
+pixl.scaled <- log10(pixl.matrix)
+
+# Include all data processing code (if necessary), clearly commented
+#First replace 0.0 entries with 0.00001 so they don't scale to inf
+pixl.matrix[pixl.matrix == 0] <- 0.00001
+#Apply log10 to every entry in pixl.matrix & get new scaled df
+pixl.scaled <- log10(pixl.matrix)
+
+#Create an elbow plot for both pixl.matrix & pixl.scaled
+# wssplot(pixl.matrix, nc=8, seed=14, 'Unscaled')
+# wssplot(pixl.scaled, nc=8, seed=14, "Scaled")
+#Comment Out to suppress for report
+
+#Do kmeans for both matrices
+unscaled.kmeans <- kmeans(pixl.matrix, 3)
+scaled.kmeans <- kmeans(pixl.scaled, 3)
+
+#Do pca for both matrices
+unscaled.pca <- prcomp(pixl.matrix)
+scaled.pca <- prcomp(pixl.scaled)
+
+#Make biplots
+unscaled.plot <- ggbiplot::ggbiplot(unscaled.pca,
+                   labels = pixl.df$type,
+                   groups = as.factor(unscaled.kmeans$cluster)) +
+                   ggtitle("Unscaled Pixl")
+
+scaled.plot <- ggbiplot::ggbiplot(scaled.pca,
+                   labels = pixl.df$type,
+                   groups = as.factor(scaled.kmeans$cluster)) +
+                   ggtitle("Scaled Pixl")
+
+
+#pheatmap(pixl.scaled, scale="none")
+
+
+

6.2 4.2 Contribution

+

Solo

+
+
+

6.3 4.3 Methods +Description

+

A goal was to scale the PIXL dataframe for better results performing +machine learning with the PIXL dataframe. The entire PIXL dataframe was +scaled using the log10 function, which applies log10 to each entry in +the dataframe.

+

4.3.1 Produce Elbow Plots to determine the best number of clusters +for clustering. It was determined three clusters was the best for both +dataframes.

+

4.3.2 Do kmeans on both scaled and non-scaled PIXL to cluster both +dataframes. Then produce pheatmaps to represent both dataframes with the +points grouped by cluster.

+

4.3.3 Do PCA on both scaled and non-scaled PIXL. Then produce biplots +to visualize the PCAs.

+
+
+

6.4 4.4 Result and +Discussion

+

Clustering wasn’t clearly better in the scaled PIXL dataframe. See +pheatmaps below (Method 4.3.2)

+

PHeatmaps for Scaled and Non-scaled PIXL Dataframes

+
#Produce heatmaps for both
+pheatmap(unscaled.kmeans$centers, scale="none", main="Unscaled Pixl")
+

+
pheatmap(scaled.kmeans$centers, scale="none", main="Scaled Pixl")
+

+

Description: PHeatmaps of both Non-scaled and scaled PIXL Dataframes. +Color corresponds to the variation of each point from the average value +within the dataframe. PIXL samples are combined together by their +corresponding cluster number.

+

Next PCA results were considered, and log scaling wasn’t clearly +better here either. See biplots below (Method 4.3.3)

+

Biplots for Scaled and Non-scaled PIXL Dataframes.

+
unscaled.plot
+

+
scaled.plot
+

+

Description: PCA can represent data with multiple features/dimensions +using only two dimensions. The above plots show the general +distributions of the features of the two PIXL dataframes. The black +arrows represent increasing amounts of the labelled compound.

+
+
+

6.5 4.5 Conclusions and +Future Work.

+

It was found the scaled PIXL dataframe did not produce better results +than the original PIXL dataframe. It was concluded log scaling PIXL +doesn’t yield better results.

+

Another attempt at scaling PIXL was done by Aadi, trying to use earth +reference data to scale PIXL.

+

At the moment, no other ideas for scaling pixl have come up.

+
+
+
+

7 5.0 Finding 3: Gradient +from Igneous to Sedimentary Rock

+
+

7.1 5.1 Data, Code, and +Resources

+
    +
  1. v1_sample_meta.Rds contains Meta Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds

  2. +
  3. v1_pixl.Rds contains PIXL Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds

  4. +
  5. v1_libs.Rds contains LIBS data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds

  6. +
+
#take the sums of the specific elements
+pixl_ternary <- pixl.df %>%
+  mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>%
+  select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>%
+  drop_na()
+
+#This is for the labels on the Ternary Plot below
+pixl_ternary <- cbind(pixl_ternary, Sample_display=
+                        c("2","3","4,6,7","5,8,9","","","","",
+                          "10,11","","12,13","","14,15","","16"))
+
+# Load the saved LIBS data with locations added
+libs.df1 <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds")
+
+libs.df1$Point <- as.numeric(libs.df1$Point)
+
+#suppressing warnings here because target and point do not get a mean calculated, 
+#but thats is fine as we have the target anyways and point is no longer relevant
+suppressWarnings(
+  libs.uniquetar <- 
+    aggregate(libs.df1, list(Target = libs.df1$Target), mean))
+
+#drop target and point from the data frame
+libs.uniquetar <- libs.uniquetar %>% select(!c(Target,Point))
+
+#Sum the elements we are looking at
+libs.df1 <- libs.df1 %>% 
+  mutate(y = (FeOT + MgO) / 100, z = (CaO+Na2O+K2O) / 100, x = (SiO2 + Al2O3) / 100)
+
+#Same thing but aggregate
+libs.uniquetar <- libs.uniquetar %>% 
+  mutate(y = (FeOT + MgO) / 100, z = (CaO+Na2O+K2O) / 100, x = (SiO2 + Al2O3) / 100)
+
+libs_ternplot <- libs.df1 %>% select(c(x,y,z))
+libs_ternplot2 <- libs.uniquetar %>% select(c(x,y,z))
+
+set.seed(1234)
+
+#kmeans on the original data
+tern.km <- kmeans(libs_ternplot, centers=4)
+
+libs_ternplot <- cbind(libs_ternplot, cluster=as.factor(tern.km$cluster))
+
+#kmeans on the aggregate data
+tern.km2 <- kmeans(as.matrix(libs_ternplot2), centers=4)
+
+libs_ternplot2 <- cbind(libs_ternplot2, cluster=as.factor(tern.km2$cluster))
+
+#ternary plot for LIBS data
+ternPlot <- ggtern() +
+  #color by cluster
+  geom_point(data=libs_ternplot, aes(x=x,y=y,z=z, colour = cluster), alpha = 0.5) +
+  scale_colour_manual(values=c('#d6001c','#54585a','#9ea2a2','#000')) +
+  labs(title="Sample Cation Compositions",
+        subtitle="LIBS data Clustered by Cation Group with PIXL samples by Campaign",
+        x="Si+Al2",
+        y="Fe+Mg",
+        z="Ca+Na2+K2") +
+  #Add pixl
+  geom_point(data=pixl_ternary, aes(x=x,y=y,z=z, shape=campaign), colour='green', size=3) +
+  # #Add labels to PIXL data corresponding to sample number
+   geom_text(data=pixl_ternary,
+            ggtern::aes(x=x, y=y, z=z, label=Sample_display,
+             hjust = ifelse(x > 0.43, 1, -0.1),   # Horizontal adjust to avoid overlap
+             vjust = ifelse(x == 0.3668, 1.3, 
+                            ifelse(x == 0.375, 1, ifelse(x > 0.43, 1.5, -0.3))),
+             fontface="bold"),
+             size=3, colour='green') +
+  theme_bw()
+
## Warning in geom_point(data = libs_ternplot, aes(x = x, y = y, z = z, colour =
+## cluster), : Ignoring unknown aesthetics: z
+
## Warning in geom_point(data = pixl_ternary, aes(x = x, y = y, z = z, shape =
+## campaign), : Ignoring unknown aesthetics: z
+
## Warning in geom_text(data = pixl_ternary, ggtern::aes(x = x, y = y, z = z, :
+## Ignoring unknown aesthetics: z
+
+
+

7.2 5.2 Contribution

+

Great credit to Aadi Lahiri & Nicolas Morawski, the original +creators of ternary plot used to derive the main the finding of this +section

+

The author modified original plot to suit the campaign section of the +Mars Mission Minder App.

+
+
+

7.3 5.3 Methods +Description

+

The cation compositions for each point in LIBS and each sample in +PIXL were calculated. Then a ternary plot was produced with both the +LIBS points, grouped by cluster, and the PIXL samples grouped by +campaign.

+
+
+

7.4 4.4 Result and +Discussion

+

Following the direction of decreasing Silicon and Aluminum, PIXL +samples transition from broadly Igneous to Sedimentary rock. This can be +seen as the campaign transitions from Crater Floor to Delta Front, which +both largely correlate to Igneous and Sedimentary rock samples.

+

Cation Compositions of LIBS Points and PIXL Samples, grouped by +Cluster and Campaign, Respectively

+
ternPlot
+

+

Description: LIBS points were clustered using kmeans and PIXL samples +were grouped by campaign. The cation compositions of all involved points +were calculated and plotted above, with the data grouping indicated by +legends.

+
+
+

7.5 5.5 Conclusions and +Future Work.

+

There exists a transformation within PIXL data from Igneous to +Sedimentary rock in the direction of decreasing Silicon and Aluminum +presence with rock samples.

+

Research into the significance of Silicon + Aluminum cation +composition in geological environments.

+

Look for a similar trend in the LIBS points. This could be done using +the updated LIBS datasets.

+
+
+
+

8 6.0 Finding 4: Campaign +Focused App Contribution

+
+

8.1 6.1 Data, Code, and +Resources

+
    +
  1. v1_sample_meta.Rds contains Meta Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds

  2. +
  3. v1_lithology.Rds contains Lithology Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds

  4. +
  5. v1_pixl.Rds contains PIXL Data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds

  6. +
  7. v1_libs.Rds contains LIBS data https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds

  8. +
+
#Make box plot for PIXL
+pixl.lf <- pixl.df %>% select(-c(Sample)) %>% pivot_longer(1:13)
+colnames(pixl.lf)<- c("campaign", "feature", "value")
+pixlPlot <- ggplot(data = pixl.lf, aes(x=factor(feature, levels = (feature %>% unique())), y=value, color = campaign)) +
+  geom_boxplot() +
+  scale_y_log10() +
+  ggtitle("PIXL, Compound Distribution by Campaign") +
+  labs(x="", y="log10 scale from percent composition") +
+  scale_colour_manual(values=c('#d6001c','#54585a'))
+
+#Make box plot for LIBS
+libs.lf <- libs.df %>% filter(Distance <= 7) %>% select(8:18) %>% select(-c("Lat.pixl","Lon.pixl"))
+
+colnames(libs.lf)[2:9] <- colnames(pixl.df)[2:9]
+colnames(libs.lf)[1] <- "campaign"
+
+libs.lf <- libs.lf %>% pivot_longer(2:9)
+colnames(libs.lf)<- c("campaign", "feature", "value")
+libsPlot <- ggplot(data = libs.lf, aes(x=factor(feature, levels = (feature %>% unique())), y=value, color = campaign)) +
+  geom_boxplot() +
+  scale_y_log10() +
+  ggtitle("LIBS, Compound Distribution by Campaign") +
+  labs(x="", y="log10 scale from percent composition") +
+  scale_colour_manual(values=c('#d6001c','#54585a'))
+
+
+

8.2 6.2 Contribution

+

Solo mostly, includes updated student data and ternary plot by Aadi +and Nicholas

+
+
+

8.3 6.3 Methods +Description

+

Prepared a tab to be included in the final Mars Mission Minder 2D +App. Created a series of relavant graphs for the general campaign +comparison question. The first graph is the one featured in Finding 1. +Another two graphs feature PIXL and LIBS. For the PIXL graph, samples +were grouped by campaign. The distribution of each compound measured in +PIXL was then plotted. Likewise, for the LIBS plot, points within 7 +meters of a sample were selected and then grouped by their campaign. The +distribution of each compound measured in LIBS was the plotted with the +filtered and grouped data. Lastly, the ternary plot from the finding 3 +was included.

+
+
+

8.4 6.4 Result and +Discussion

+

Plot Representing SHERLOC in Campaign Analysis Total Mineral +Occurrence with Samples grouped by Campaign

+
sherlocPlot
+

+

Description: From the SHERLOC dataset, samples were grouped by +campaign. Then the total count of each mineral occurrence was graphed +using the grouped data.

+

Plot Representing PIXL in Campaign Analysis Sample Compound +Distributions, Samples grouped by Campaign

+
pixlPlot
+
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
+
## Warning: Removed 5 rows containing non-finite outside the scale range
+## (`stat_boxplot()`).
+

+

Description: From PIXL dataset, samples were grouped by campaign. +Then the distribution of each compound measured in PIXL was graphed +using the grouped data.

+

Plot Representing LIBS in Campaign Analysis Sample Compound +Distributions, Samples grouped by Campaign

+
libsPlot
+
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
+
## Warning: Removed 118 rows containing non-finite outside the scale range
+## (`stat_boxplot()`).
+

+

Description: LIBS points within 7 meters of a sample were selected +and then grouped by their campaign. The distributions of each compound +measured in LIBS were plotted using the filtered and grouped data.

+

Plot Representing Cation Compositions Cation Compositions of LIBS and +PIXL points

+
ternPlot
+

+

Description: Cation compositions of each point in LIBS and each +sample in PIXL were calculated, then plotted with LIBS points grouped by +clusters and PIXL samples grouped by campaign. . ## 6.5 Conclusions and +Future Work.

+

Plots representing the difference between campaigns were successfully +gathered together to create a final tab for the Mars Mission Minder +App.

+

Future work would include a combined version with Evangaline’s +(Xuantang) campaign comparison work.

+
+
+
+

9 Bibliography

+
    +
  • pandoc
  • +
  • rmarkdown
  • +
  • tidyverse
  • +
  • stringr
  • +
  • ggbiplot
  • +
  • pheatmap
  • +
  • knitr
  • +
  • paletteer
  • +
  • plotly
  • +
  • GGally
  • +
+
+
+

10 Appendix

+

5.0 Finding 3 was investigating if log scaling PIXL produces better +results. The two plots below were generated as a part of this analysis. +From these, it was concluded three clusters should be made for PIXL and +the scaled PIXL dataframe.

+

Title: Elbow Plots for unscaled and scaled versions of PIXL

+
#Create an elbow plot for both pixl.matrix & pixl.scaled
+wssplot(pixl.matrix, nc=8, seed=14, 'Unscaled')
+

+
wssplot(pixl.scaled, nc=8, seed=14, "Scaled")
+

+

Description: Elbow plots show the quality of clustering compared to +the number of clusters created for a dataframe. These represent the +unscaled and scaled PIXL dataframes respectively. Note the seed was set +to 14 for this analysis, which is important for reproducing the same +results.

+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + + diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.pdf b/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.pdf new file mode 100644 index 0000000..819c99b Binary files /dev/null and b/StudentNotebooks/Assignment08_FinalProjectNotebook/compta_finalF24.pdf differ