diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/morawn_finalProjectF24.Rmd b/StudentNotebooks/Assignment08_FinalProjectNotebook/morawn_finalProjectF24.Rmd new file mode 100755 index 0000000..dff5ea9 --- /dev/null +++ b/StudentNotebooks/Assignment08_FinalProjectNotebook/morawn_finalProjectF24.Rmd @@ -0,0 +1,448 @@ +--- +title: "Data Analytics Research Individual Final Project Report - Mars" +author: "Nicolas Morawski" +date: "Fall 2024" +output: + html_document: + toc: yes + toc_depth: 3 + toc_float: yes + number_sections: yes + theme: united + html_notebook: default + pdf_document: + toc: yes + toc_depth: '3' +--- + + + + + + +# DAR Project and Group Members + + +* Project name: _Mars_ +* Github ID: _dar-morawn_ +* Project team members: Dante Mwatibo, Doña Roberts, David Walcyzk, Xuanting Wang, Ashton +Compton, Margo VanEsselstyn, Charlotte Peterson, CJ Marino, Aadi Lahiri + + +# 0.0 Preliminaries. + +This report is generated from an R Markdown file that includes all the R code necessary to produce the results described and embedded in the report. Code blocks can be surpressed from output for readability using the command code `{R, echo=show}` in the code block header. If `show <- FALSE` the code block will be surpressed; if `show <- TRUE` then the code will be show. + +```{r} +# Set to TRUE to expand R code blocks; set to FALSE to collapse R code blocks +show <- TRUE +``` + + +Executing this R notebook requires some subset of the following packages: + +* `ggplot2` +* `knitr` +* `tidyverse` +* `ggtern` +* `pandoc` +* `pheatmap` +* `caret` +* `ggbiplot` +* `stringr` + +These will be installed and loaded as necessary (code suppressed). + + +```{r, include=FALSE} +# This code will install required packages if they are not already installed +# ALWAYS INSTALL YOUR PACKAGES LIKE THIS! +if (!require("ggplot2")) { + install.packages("ggplot2") + library(ggplot2) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} +if (!require("caret")) { + install.packages("caret") + library(caret) +} +if (!require("knitr")) { + install.packages("knitr") + library(knitr) +} +if (!require("BBmisc")) { + install.packages("BBmisc") + library(BBmisc) +} +if (!require("ggtern")) { + install.packages("ggtern") + library(ggtern) +} +``` + +# 1.0 Project Introduction + + +The Mars Project is focused on data from the 2020 Mars Perseverance Rover. The goal of the mission is to +look for microbial ancient life or forms of water on Mars (things that could suggest life). Perseverance uses +multiple instruments, including PIXL (Planetary Instrument for X-Ray Lithochemistry), SHERLOC (Scanning +Habitable Environments with Raman and Luminescence for Organics and Chemicals) and SUPERCAM. +SUPERCAM has multiple instruments that measure spectroscopy to measure properties of materials on Mars, +including LIBS (Laser-induced breakdown spectroscopy). This notebook will primarily focus on the data we +have been given of PIXL and LIBS, along with implementation of such plots done by the team in a R Shiny app. + + +# 2.0 Organization of Report + +This report is organize as follows: + + +* Section 3.0: Finding 1: Mars MissionMinder App. I was a member of the App team, where I greatly helped with the original design and wireframing of the proposed app, along with then implementing various pages/tabs. + +* Section 4.0: Finding 2: Plotting LIBS Samples. This includes some graphs and analyses I made in regards to the LIBS dataset. + +* Section 5.0: Finding 3: Relating LIBS samples to PIXL Samples. This was some early work I did in relating the PIXL and LIBS samples, but it did not amount to much. + +* Section 6.0 Overall conclusions and suggestions + + +# 3.0 Finding 1: Mars MissionMinder App + + +I spent most of my time working on planning and developing the Mars MissionMinder App, the 2D app that allows the Mars team to cohesively display the graphs and analyses that were made using the given 2020 Perseverance Rover data. I was the main lead is planning and designing the app, and was a contributor in coding and implementing parts of the live app, mainly the PIXL and LIBS comparison pages. + + +## 3.1 Data, Code, and Resources + + +1. The live, up-to-date Mars MissionMinder App: +[https://lp01.idea.rpi.edu/shiny/erickj4/MarsMissionMinder-F24/](https://lp01.idea.rpi.edu/shiny/erickj4/MarsMissionMinder-F24/) + +2. The most recent wireframe demo of the app: +[https://www.figma.com/proto/xWna1t30YmaSaIIAClolMv/Mars-Mission-Minder-Draft3](https://www.figma.com/proto/xWna1t30YmaSaIIAClolMv/Mars-Mission-Minder-Draft3?node-id=4001-248&node-type=canvas&t=CWBhUfytapX4NoF4-0&scaling=contain&content-scaling=fixed&page-id=0%3A1&starting-point-node-id=4001%3A248). + +3. The GitHub repository for the Mars MissionMinderApp +[https://github.rpi.edu/DataINCITE/MarsMissionMinder-F24] (https://github.rpi.edu/DataINCITE/MarsMissionMinder-F24) + +4. The final slides the team used to present the MissionMinder App +[https://docs.google.com/presentation/d/16RcCLHMOodJQVmY9HusaeqJhfeiwEq7t3sJX3CwSsF8/edit#slide=id.p +] (https://docs.google.com/presentation/d/16RcCLHMOodJQVmY9HusaeqJhfeiwEq7t3sJX3CwSsF8/edit#slide=id.p +) + + +## 3.2 Contribution + + +This section was joint work, as I was a member of the App team, which consisted of Doña Roberts, David Walcyzk, CJ Marino, Dr. John Erickson, and myself. + + +## 3.3 Methods Description + + +Before I even began to work on the Mars MissionMinder app, I was tasked with making a wireframe as to what the the app could potentially look like. I used Figma to draw up my plans, and went through 4-5 different iterations before moving on to work on the actual app. I took a great amount of feedback into consideration, mainly from Dr. Kristen Bennett, Dr. John Erickson, and Dr. Karyn Rogers. The second listed resource in section 3.1 is an interactive demo of the wireframe. As for the MissionMinder app, it was done collaboratively in R Shiny. I worked a lot with CJ, Aadi, and Doña to work out the pages will be present on the app. I got to work on the presentation of the PIXL and LIBS datasets, which consists of plots and graphs relating to the datas' most important features, as well as how the sample selector would work. I will go into more detail about this later. + + +## 3.4 Result and Discussion + + +The result of all this work lead to the currently live Mars MissionMinder App. The point of the app is to display and feature the many different aspects of Mars, captured by the Perseverance Rover. Upon opening the app, you are greeted by a satellite view of the Mars surface, along with highlighted points that indicate the PIXl sample sites, along with the locations where LIBS data was recorded. On the left side is a sample selector, which allows the user to select between the PIXL samples, along with the option to select by rover campaign. There are several pages that go along with the map. The first page is the "Comparison" page. Here, we show plots and analyses for the three main datasets, the PIXL data, LIBS data, and SHERLOC data. Each dataset has its own section. The next page is the "Exploration" tab, which is used to help us answer the important questions that we want know. Questions such as the relationship between the PIXL and LIBS datasets, the PIXL and SHERLOC datasets, how the rover campaigns compare, further analysis into the geology of Mars, and how we can predict the existence of organic matter. Finally, there is also the "Sample Report" tab, which gives a brief overview of each mineral sample taken. + +This section is to go over some of the plots I worked on for the app, along with how they are/going to be presented in the app. To start, here are some screenshots for the PIXL comparison page: + + +This is a heatmap that presents all of the selected PIXL samples and their weights. It is interacble, and one can hover over each cell to see the values. It is a good way to highlight the high presence of some elements, while also shoing the insignificance of others. + + +This is a ternary diagram that plots all of the selected PIXL samples. In this instance, the samples are colored by whatever campaign they are located in, but feature is bound to change. It will most likely change so the end user has the ability to color the data by a chosen feature (campaign, rock type, location, k-means clustering, etc.). This screenshot also includes a latitude/longitude scatter plot on the left, which shows the location of all of the taken samples. While the current instance of this map is not mine, the original plot was mine. Dr. Erickson has since made the necessary changes to make it much more interactable and visually pleasing. + +I have also made some contributions towards the LIBS comparison page. Most of the plots currently present are not my own (done by Aadi and David), but I made great strides towards the plots' implementations on the app: + + +The ternary diagram presented is one I did, as seen below in section 4.4. Aadi helped me make it more presentable, along with adding the PIXL samples as reference points. This was an incredibly important plot for the entire project, as it proved to be a large backbone to many further, more complex analyses that other group members worked on (mainly the works of Charlotte and Margo). + + +This is a Principal Component Analysis (PCA) done on the LIBS dataset, colored and clustered in a consistent manner that has been done throughout other analyses done by the MARS team. I received some help from David with its implementation. The best part of this plot is its interactability, as it allows users to scale the data, choose the number of clusters graphed, as well as change the color of the points to indicate different features, as desired. The plot is really good at highlighting correlations between the different clusters and their high composition weights. For example, Cluster 4 generally has a very high concentration of Aluminum, but a very low average concentration of Magnesium. + + +## 3.5 Conclusions, Limitations, and Future Work. + + +All in all, the app is incredibly important for all of those that have been working with the Perseverance rover data, from us RPI students, to the scientists at NASA. There is still a lot of work to be done, which includes further inclusions of plots and analyses, more interactability, and further polishing the appearance. + + +# 4.0 Finding 2: Plotting LIBS Samples + + +In the first half of the semester, I worked on several analyses in regards to the LIBS dataset. These plots and graphs delved into the mineral composition as well as the location of the Perserverance's SuperCam targets. + + +## 4.1 Data, Code, and Resources + + +1. supercam_libs_moc_loc.Rds, which is the LIBS dataset [https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/Data/supercam_libs_moc_loc.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/Data/supercam_libs_moc_loc.Rds) + +2. LIBS_calibration_targets.Rds, which are the Earth calibration targets for SuperCam [https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/StudentData/LIBS_calibration_targets.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/StudentData/LIBS_calibration_targets.Rds) + + +## 4.2 Contribution + + +This section was solo work, but I received feedback from Dr. Kristen Bennett and Dr. John Erickson. + + +## 4.3 Methods Description + + +I solely used the provided LIBS dataset for these analyses. I clustered the LIBS data using k-means into four separate clusters, which are consistent throughtout the rest of the plots. The first thing I wanted to look into was how all of the LIBS samples compared compositionally. We were instructed about the important cations that the NASA scientists were mainly keeping an eye out for, so I made a ternary diagram plotting the samples onto three axes, Si+Al, Fe+Mg, and Ca+Na+K. The next graph was a scatter plot, with the intention to highlight the location of all of the LIBS samples, to potentially see if there was a relation between the mineral composition and the location. I also took the time to parse the information about the Earth samples the Perseverance rover carried with it, and plotted these points on a ternary diagram. The idea to do this was so the data could be readily accessible and it would allow for further analyses and comparisons. + +```{r} +libs_data <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") +libs_target_data <- readRDS("~/DAR-Mars-F24/StudentData/LIBS_calibration_targets.Rds") +``` + +## 4.4 Result and Discussion + + +To plot the LIBS data on the ternary diagram, I need to slightly manipulate the data, so only the important features are included. +```{r} +# Chosen features +libs_trim <- libs_data %>% select(c(SiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) + +#Chosen features tunred in to percentages to represent mineral comp weights +libs_ternary <- libs_trim %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>% + drop_na() +``` + +I then perform k-means on the data to cluster all of the samples, so I can make a conclusion about the data. +```{r} +# K-means +set.seed(10) +k <- 4 +tern.km <- kmeans(libs_ternary, k) + +libs_ternary_clustered <- cbind(libs_ternary, cluster=as.factor(tern.km$cluster)) +tern_clusters <- libs_ternary_clustered$cluster +``` + +These clusters will be consistent with the other plots. We can now graph the data. +```{r} +ggtern(data= libs_ternary_clustered, mapping=ggtern::aes(x=x,y=y,z=z)) + + geom_point(aes(color=tern_clusters)) + #tern_clusters + theme_rgbw() + + labs(title="Perserverance LIBS Samples Mineral Compositions", + x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K") +``` + +Of the four determined LIBS clusters, three of them show clear correlations (Clusters 1,3,4). These three clusters exhibit the inverse relationship between the concentration of the Iron + Magnesium cations vs. the Silicon + Aluminum cations. Cluster 2 is quite interesting though, as it is the only cluster that shows high concentrations in Calcium + Sodium + Potassium. + +To graph the LIBS data in regards to location, a little more manipulation is needed. The clusters chosen above are used again for this plot. +```{r} +libs_loc <- libs_data %>% select(c(SiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O, lat, lon)) + +libs_loc_ternary <- libs_loc %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>% + select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>% + drop_na() +libs_loc_ternary <- cbind(libs_loc_ternary, cluster=as.factor(tern.km$cluster)) + +ggplot(libs_loc_ternary, aes(x=lon, y=lat, colour=cluster)) + + geom_point() + + ggtitle("Clustered LIBS Data Graphed by Location") +``` + +When comparing these clusters to their geographic location, there is no set clustering. Cluster 3 is primarily located in the later half of Perseverance's journey, while Cluster 4 is mainly in the first half. Interestingly enough, again, Cluster 2 differs greatly from the rest of the clusters, being present mainly in the middle part of Perseverance's journey. + +I have also included the work I did to graph the Earth calibration samples onto a ternary plot. While the work I did is insignificant, these Earth calibration samples are incredibly important. The Perseverance Rover carries a calibration target plate with several Earth-based samples attached, which act as controls to test instrument performance and ensure accurate readings over the course of the mission. These reference minerals allow SuperCam to fine-tune its LIBS laser, so it could be important for us to factor this into our research and our calculations with regards to data scaling. Since these calibration targets are scanned for the same composition weights as the PIXL and LIBS samples, we now have a very useful way to compare the Mars geology to the familiar Earth geology. +```{r} +libs_target_trim <- libs_target_data %>% select(c(Description,Si, Al, Fe, Mg, Ca, Na, K)) + +#Selecting features +libs_target_ternary <- libs_target_trim %>% + mutate(x=(Si+Al),y=(Fe+Mg),z=(Ca+Na+K)) %>% + select(-c(Si,Al,Fe,Mg,Ca,Na,K)) %>% + drop_na() +SampleNames <- libs_target_ternary$Description + +ggtern(libs_target_ternary, ggtern::aes(x=x,y=y,z=z)) + + geom_point(aes(color=SampleNames)) + + theme_rgbw() + + labs(title="LIBS Earth Calibration Targets", + x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K") +``` + + +## 4.5 Conclusions, Limitations, and Future Work. + + +All in all, these plots I made were useful, but to fully capitalize on them, it would be good to compare the LIBS samples alongside the PIXL samples, something that many of my other group members have already looked into doing. + + +# 5.0 Finding 3: Relating LIBS samples to PIXL Samples + + +This section includes something I did in the very beginning of the semester. It is not very useful looking back, but I am still including it. The idea was to try to find what PIXL samples corresponded to which LIBS samples. + + +## 5.1 Data, Code, and Resources + + +1. supercam_libs_moc_loc.Rds, which is the LIBS dataset [https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/Data/supercam_libs_moc_loc.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/Data/supercam_libs_moc_loc.Rds) + +2. samples_pixl_wide.Rds, which is the PIXL dataset [https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/samples_pixl_wide.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/samples_pixl_wide.Rds) + +3. v1_pixl.Rds, which is the updated PIXL dataset, properly formatted [https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds) + +4. v1_libs.Rds, which is the updated LIBS dataset, properly formatted [https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds) + + +## 5.2 Contribution + + +This was solo work, and an original idea that did not help as much as I originally thought it would. + + +## 5.3 Methods Description + + +We know that the LIBS data points were manually chosen by a team of scientists due to potential points of interest, and these interesting points influence the campaign/pathing of the Perseverance rover. I wanted to try to figure out if these decisions were reflected by the actual mineral samples taken, exhibited by the PIXL data, and if the PIXL samples were good indicators of the surrounding geology. To do this, I am used the PIXL data, along with the unscaled & clustered LIBS data from before. I purposely left out Sample 1 of the PIXL data, as it was an atmospheric sample and could potentially mess with clustering. Also, I scaled the PIXL data in two ways; (1) Scaled/normalized in relation to LIBS data and (2) Scaled using scale(). The second method is exhibited below, the first will be shown later. + +```{r} +libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") +pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") +``` + + +## 5.4 Result and Discussion + + +Before I could do anything, I needed to clean and cluster the LIBS data. + +```{r} +libs.df <- libs.df %>% + select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) + +libs.df$point <- as.numeric(libs.df$point) + +# Make the a matrix contain only the libs measurements for each mineral +libs.matrix <- as.matrix(libs.df[,6:13]) + +set.seed(10) +k <- 5 +km_reg <- kmeans(libs.matrix,k) +cluster1 <- km_reg$cluster +libs.reg.clustered <- cbind(libs.df,cluster1) +``` + +Next, I needed to get the PIXL data ready by making sure it had the same columns as the clustered LIBS data + +```{r} +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) +pixl.df <- pixl.df[2:16,] #Excluding first, atmospheric sample +# Make the matrix of just mineral percentage measurements +pixl.matrix <- pixl.df[,2:14] +pixl.matrix.scaled <- pixl.df[,2:14] %>% scale() +``` + +I start with scaling and normalizing the LIBS data, then perform the same treatment to the PIXL data. I then give the LIBS data the same clustering found from before, and classify the 15 PIXL samples using KNN. I also perform k-means clustering on the pre-scaled PIXL data and compare how the two methods differ in their PIXL results. + +```{r, result01_analysis} +# Prepare dataset for clustering selecting specific columns of interest and putting in a matrix +pixl_trim <- pixl.df %>% + dplyr::select(c("Na20","Mgo","Al203","Si02", "K20","Cao","Ti02","FeO-T")) %>% + rename("Na2O"="Na20","MgO"="Mgo","Al2O3"="Al203","SiO2"="Si02","K2O"="K20", + "CaO"="Cao","TiO2"="Ti02","FeOT"="FeO-T") +libs_trim <- libs.df %>% + dplyr::select(c("Na2O","MgO","Al2O3","SiO2", "K2O","CaO","TiO2", "FeOT")) + +# Normalize/scale training/test data +scaler <- preProcess(libs_trim, method = c("center", "scale")) +train <- predict(scaler, libs_trim) +test.pixl <- predict(scaler, pixl_trim) + +# KNN model +classtrain <- as.factor(libs.reg.clustered$cluster) +train.df <- cbind(train,classtrain) +model<- knn3(classtrain ~ ., data = train.df, k = 40) +pixl.class <- predict(model,test.pixl, type="class") +pixl.predicted <- cbind(pixl.df,pixl.class) +#IMPORTANT: Use for heatmap +pixl.classified.scaled <- cbind(test.pixl, pixl.class) + +# PIXL K-means +pixl.matrix.scaled <- pixl.df %>% dplyr::select(c("Na20","Mgo","Al203","Si02", + "K20","Cao","Ti02","FeO-T")) %>% scale() +set.seed(10) +k <- 4 +km2 <- kmeans(pixl.matrix.scaled,k) +cluster <- km2$cluster +pixl.kmean <- cbind(pixl.df,cluster) + +# Heatmaps. Plenty of room to change/fix/adjust. +pheatmap(km2$centers,scale="none", main="K-means Clustered PIXL Average Mineral Composition Weights") + +heatmap.data = data.frame(matrix(nrow = 0, ncol = ncol(km2$centers))) +colnames(heatmap.data) = colnames(km2$centers) +for (x in 1:6) { + test.df <- pixl.classified.scaled %>% filter(pixl.class == x) + if (dim(test.df)[1] != 0) { + test.df<- test.df[ , !(names(test.df) %in% c("pixl.class"))] + heatmap.data[nrow(heatmap.data)+1,] <- colMeans(test.df) + } +} + +pheatmap(heatmap.data,scale="none", main="KNN Classified PIXL Average Mineral Composition Weights") + +``` + +I used 5 clusters when clustering the LIBS data, and used the same 5 on the PIXL KNN classifcation. However, I noticed one of the clusters completely disappeared, and only 4 clusters are highlighted. This is why i made the decision to perform k-means on the PIXl data with k=4. In comparing the two heatmaps, there are a lot of similarities. In both, cluster 5 has the most traces of SiO2, and cluster 4 has the most traces of FeO-T. There are other clusters that also line up with very minimal traces of MgO and SiO2. The best overarching way to compare, I believe, is to compare the column dendrograms. The big problem with doing this analysis was that I included the insignificant cation data, which very likely skewed my results. There are some other errors with my thought process, and a little too much ambiguity to get a proper understanding of anything. For this reason, I scrapped this idea and moved on. + +## 5.5 Conclusions, Limitations, and Future Work. + + +All in all, this finding was not very useful, and looking back at it, somewhat confusing. I would not recommened doing this again, but if one did, much more planning and understanding of the data is needed. + + +# 6.0 Overall conclusions and suggestions + + +# Bibliography + + +* [Cousin21] Cousin, A., Sautter, V., Fabre, C., Dromart, G., Montagnac, G., Drouet, C., Meslin, P. Y., Gasnault, O., Beyssac, O., Bernard, S., Cloutis, E., Forni, O., Beck, P., Fouchet, T., Johnson, J. R., Lasue, J., Ollila, A. M., De Parseval, P., Gouy, S., & Caron, B. (2021). SuperCam calibration targets on board the perseverance rover: Fabrication and quantitative characterization. Spectrochimica Acta Part B: Atomic Spectroscopy, 106341. https://doi.org/10.1016/j.sab.2021.106341 + +```{r} +citation("ggtern") +``` + + diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/morawn_finalProjectF24.html b/StudentNotebooks/Assignment08_FinalProjectNotebook/morawn_finalProjectF24.html new file mode 100644 index 0000000..22eefa9 --- /dev/null +++ b/StudentNotebooks/Assignment08_FinalProjectNotebook/morawn_finalProjectF24.html @@ -0,0 +1,2088 @@ + + + + +
+ + + + + + + + + +This report is generated from an R Markdown file that includes all
+the R code necessary to produce the results described and embedded in
+the report. Code blocks can be surpressed from output for readability
+using the command code {R, echo=show}
in the code block
+header. If show <- FALSE
the code block will be
+surpressed; if show <- TRUE
then the code will be
+show.
# Set to TRUE to expand R code blocks; set to FALSE to collapse R code blocks
+show <- TRUE
+
+Executing this R notebook requires some subset of the following +packages:
+ggplot2
knitr
tidyverse
ggtern
pandoc
pheatmap
caret
ggbiplot
stringr
These will be installed and loaded as necessary (code +suppressed).
+ +The Mars Project is focused on data from the 2020 Mars Perseverance +Rover. The goal of the mission is to look for microbial ancient life or +forms of water on Mars (things that could suggest life). Perseverance +uses multiple instruments, including PIXL (Planetary Instrument for +X-Ray Lithochemistry), SHERLOC (Scanning Habitable Environments with +Raman and Luminescence for Organics and Chemicals) and SUPERCAM. +SUPERCAM has multiple instruments that measure spectroscopy to measure +properties of materials on Mars, including LIBS (Laser-induced breakdown +spectroscopy). This notebook will primarily focus on the data we have +been given of PIXL and LIBS, along with implementation of such plots +done by the team in a R Shiny app.
+This report is organize as follows:
+Section 3.0: Finding 1: Mars MissionMinder App. I was a member of +the App team, where I greatly helped with the original design and +wireframing of the proposed app, along with then implementing various +pages/tabs.
Section 4.0: Finding 2: Plotting LIBS Samples. This includes some +graphs and analyses I made in regards to the LIBS dataset.
Section 5.0: Finding 3: Relating LIBS samples to PIXL Samples. +This was some early work I did in relating the PIXL and LIBS samples, +but it did not amount to much.
Section 6.0 Overall conclusions and suggestions
I spent most of my time working on planning and developing the Mars +MissionMinder App, the 2D app that allows the Mars team to cohesively +display the graphs and analyses that were made using the given 2020 +Perseverance Rover data. I was the main lead is planning and designing +the app, and was a contributor in coding and implementing parts of the +live app, mainly the PIXL and LIBS comparison pages.
+The live, up-to-date Mars MissionMinder App: https://lp01.idea.rpi.edu/shiny/erickj4/MarsMissionMinder-F24/
The most recent wireframe demo of the app:
+https://www.figma.com/proto/xWna1t30YmaSaIIAClolMv/Mars-Mission-Minder-Draft3.
The GitHub repository for the Mars MissionMinderApp
+[https://github.rpi.edu/DataINCITE/MarsMissionMinder-F24]
+(https://github.rpi.edu/DataINCITE/MarsMissionMinder-F24)
The final slides the team used to present the MissionMinder
+App
+[https://docs.google.com/presentation/d/16RcCLHMOodJQVmY9HusaeqJhfeiwEq7t3sJX3CwSsF8/edit#slide=id.p]
+(https://docs.google.com/presentation/d/16RcCLHMOodJQVmY9HusaeqJhfeiwEq7t3sJX3CwSsF8/edit#slide=id.p
+)
This section was joint work, as I was a member of the App team, which +consisted of Doña Roberts, David Walcyzk, CJ Marino, Dr. John Erickson, +and myself.
+Before I even began to work on the Mars MissionMinder app, I was +tasked with making a wireframe as to what the the app could potentially +look like. I used Figma to draw up my plans, and went through 4-5 +different iterations before moving on to work on the actual app. I took +a great amount of feedback into consideration, mainly from Dr. Kristen +Bennett, Dr. John Erickson, and Dr. Karyn Rogers. The second listed +resource in section 3.1 is an interactive demo of the wireframe. As for +the MissionMinder app, it was done collaboratively in R Shiny. I worked +a lot with CJ, Aadi, and Doña to work out the pages will be present on +the app. I got to work on the presentation of the PIXL and LIBS +datasets, which consists of plots and graphs relating to the datas’ most +important features, as well as how the sample selector would work. I +will go into more detail about this later.
+The result of all this work lead to the currently live Mars +MissionMinder App. The point of the app is to display and feature the +many different aspects of Mars, captured by the Perseverance Rover. Upon +opening the app, you are greeted by a satellite view of the Mars +surface, along with highlighted points that indicate the PIXl sample +sites, along with the locations where LIBS data was recorded. On the +left side is a sample selector, which allows the user to select between +the PIXL samples, along with the option to select by rover campaign. +There are several pages that go along with the map. The first page is +the “Comparison” page. Here, we show plots and analyses for the three +main datasets, the PIXL data, LIBS data, and SHERLOC data. Each dataset +has its own section. The next page is the “Exploration” tab, which is +used to help us answer the important questions that we want know. +Questions such as the relationship between the PIXL and LIBS datasets, +the PIXL and SHERLOC datasets, how the rover campaigns compare, further +analysis into the geology of Mars, and how we can predict the existence +of organic matter. Finally, there is also the “Sample Report” tab, which +gives a brief overview of each mineral sample taken.
+This section is to go over some of the plots I worked on for the app, +along with how they are/going to be presented in the app. To start, here +are some screenshots for the PIXL comparison page:
+ This is a heatmap that
+presents all of the selected PIXL samples and their weights. It is
+interacble, and one can hover over each cell to see the values. It is a
+good way to highlight the high presence of some elements, while also
+shoing the insignificance of others.
This is
+a ternary diagram that plots all of the selected PIXL samples. In this
+instance, the samples are colored by whatever campaign they are located
+in, but feature is bound to change. It will most likely change so the
+end user has the ability to color the data by a chosen feature
+(campaign, rock type, location, k-means clustering, etc.). This
+screenshot also includes a latitude/longitude scatter plot on the left,
+which shows the location of all of the taken samples. While the current
+instance of this map is not mine, the original plot was mine.
+Dr. Erickson has since made the necessary changes to make it much more
+interactable and visually pleasing.
I have also made some contributions towards the LIBS comparison page. +Most of the plots currently present are not my own (done by Aadi and +David), but I made great strides towards the plots’ implementations on +the app:
+
+The ternary diagram presented is one I did, as seen below in section
+4.4. Aadi helped me make it more presentable, along with adding the PIXL
+samples as reference points. This was an incredibly important plot for
+the entire project, as it proved to be a large backbone to many further,
+more complex analyses that other group members worked on (mainly the
+works of Charlotte and Margo).
This is a
+Principal Component Analysis (PCA) done on the LIBS dataset, colored and
+clustered in a consistent manner that has been done throughout other
+analyses done by the MARS team. I received some help from David with its
+implementation. The best part of this plot is its interactability, as it
+allows users to scale the data, choose the number of clusters graphed,
+as well as change the color of the points to indicate different
+features, as desired. The plot is really good at highlighting
+correlations between the different clusters and their high composition
+weights. For example, Cluster 4 generally has a very high concentration
+of Aluminum, but a very low average concentration of Magnesium.
All in all, the app is incredibly important for all of those that +have been working with the Perseverance rover data, from us RPI +students, to the scientists at NASA. There is still a lot of work to be +done, which includes further inclusions of plots and analyses, more +interactability, and further polishing the appearance.
+In the first half of the semester, I worked on several analyses in +regards to the LIBS dataset. These plots and graphs delved into the +mineral composition as well as the location of the Perserverance’s +SuperCam targets.
+supercam_libs_moc_loc.Rds, which is the LIBS dataset https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/Data/supercam_libs_moc_loc.Rds
LIBS_calibration_targets.Rds, which are the Earth calibration +targets for SuperCam https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/StudentData/LIBS_calibration_targets.Rds
This section was solo work, but I received feedback from Dr. Kristen +Bennett and Dr. John Erickson.
+I solely used the provided LIBS dataset for these analyses. I +clustered the LIBS data using k-means into four separate clusters, which +are consistent throughtout the rest of the plots. The first thing I +wanted to look into was how all of the LIBS samples compared +compositionally. We were instructed about the important cations that the +NASA scientists were mainly keeping an eye out for, so I made a ternary +diagram plotting the samples onto three axes, Si+Al, Fe+Mg, and Ca+Na+K. +The next graph was a scatter plot, with the intention to highlight the +location of all of the LIBS samples, to potentially see if there was a +relation between the mineral composition and the location. I also took +the time to parse the information about the Earth samples the +Perseverance rover carried with it, and plotted these points on a +ternary diagram. The idea to do this was so the data could be readily +accessible and it would allow for further analyses and comparisons.
+libs_data <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
+libs_target_data <- readRDS("~/DAR-Mars-F24/StudentData/LIBS_calibration_targets.Rds")
+To plot the LIBS data on the ternary diagram, I need to slightly +manipulate the data, so only the important features are included.
+# Chosen features
+libs_trim <- libs_data %>% select(c(SiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O))
+
+#Chosen features tunred in to percentages to represent mineral comp weights
+libs_ternary <- libs_trim %>%
+ mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>%
+ select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>%
+ drop_na()
+I then perform k-means on the data to cluster all of the samples, so +I can make a conclusion about the data.
+# K-means
+set.seed(10)
+k <- 4
+tern.km <- kmeans(libs_ternary, k)
+
+libs_ternary_clustered <- cbind(libs_ternary, cluster=as.factor(tern.km$cluster))
+tern_clusters <- libs_ternary_clustered$cluster
+These clusters will be consistent with the other plots. We can now +graph the data.
+ggtern(data= libs_ternary_clustered, mapping=ggtern::aes(x=x,y=y,z=z)) +
+ geom_point(aes(color=tern_clusters)) + #tern_clusters
+ theme_rgbw() +
+ labs(title="Perserverance LIBS Samples Mineral Compositions",
+ x="Si+Al",
+ y="Fe+Mg",
+ z="Ca+Na+K")
+Of the four determined LIBS clusters, three of them show clear +correlations (Clusters 1,3,4). These three clusters exhibit the inverse +relationship between the concentration of the Iron + Magnesium cations +vs. the Silicon + Aluminum cations. Cluster 2 is quite interesting +though, as it is the only cluster that shows high concentrations in +Calcium + Sodium + Potassium.
+To graph the LIBS data in regards to location, a little more +manipulation is needed. The clusters chosen above are used again for +this plot.
+libs_loc <- libs_data %>% select(c(SiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O, lat, lon))
+
+libs_loc_ternary <- libs_loc %>%
+ mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>%
+ select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>%
+ drop_na()
+libs_loc_ternary <- cbind(libs_loc_ternary, cluster=as.factor(tern.km$cluster))
+
+ggplot(libs_loc_ternary, aes(x=lon, y=lat, colour=cluster)) +
+ geom_point() +
+ ggtitle("Clustered LIBS Data Graphed by Location")
+When comparing these clusters to their geographic location, there is +no set clustering. Cluster 3 is primarily located in the later half of +Perseverance’s journey, while Cluster 4 is mainly in the first half. +Interestingly enough, again, Cluster 2 differs greatly from the rest of +the clusters, being present mainly in the middle part of Perseverance’s +journey.
+I have also included the work I did to graph the Earth calibration +samples onto a ternary plot. While the work I did is insignificant, +these Earth calibration samples are incredibly important. The +Perseverance Rover carries a calibration target plate with several +Earth-based samples attached, which act as controls to test instrument +performance and ensure accurate readings over the course of the mission. +These reference minerals allow SuperCam to fine-tune its LIBS laser, so +it could be important for us to factor this into our research and our +calculations with regards to data scaling. Since these calibration +targets are scanned for the same composition weights as the PIXL and +LIBS samples, we now have a very useful way to compare the Mars geology +to the familiar Earth geology.
+libs_target_trim <- libs_target_data %>% select(c(Description,Si, Al, Fe, Mg, Ca, Na, K))
+
+#Selecting features
+libs_target_ternary <- libs_target_trim %>%
+ mutate(x=(Si+Al),y=(Fe+Mg),z=(Ca+Na+K)) %>%
+ select(-c(Si,Al,Fe,Mg,Ca,Na,K)) %>%
+ drop_na()
+SampleNames <- libs_target_ternary$Description
+
+ggtern(libs_target_ternary, ggtern::aes(x=x,y=y,z=z)) +
+ geom_point(aes(color=SampleNames)) +
+ theme_rgbw() +
+ labs(title="LIBS Earth Calibration Targets",
+ x="Si+Al",
+ y="Fe+Mg",
+ z="Ca+Na+K")
+All in all, these plots I made were useful, but to fully capitalize +on them, it would be good to compare the LIBS samples alongside the PIXL +samples, something that many of my other group members have already +looked into doing.
+This section includes something I did in the very beginning of the +semester. It is not very useful looking back, but I am still including +it. The idea was to try to find what PIXL samples corresponded to which +LIBS samples.
+supercam_libs_moc_loc.Rds, which is the LIBS dataset https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/2fbb9b7988d536656bb118a0d8e0b644392ca09a/Data/supercam_libs_moc_loc.Rds
samples_pixl_wide.Rds, which is the PIXL dataset https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/samples_pixl_wide.Rds
v1_pixl.Rds, which is the updated PIXL dataset, properly +formatted https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds
v1_libs.Rds, which is the updated LIBS dataset, properly +formatted https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds
This was solo work, and an original idea that did not help as much as +I originally thought it would.
+We know that the LIBS data points were manually chosen by a team of +scientists due to potential points of interest, and these interesting +points influence the campaign/pathing of the Perseverance rover. I +wanted to try to figure out if these decisions were reflected by the +actual mineral samples taken, exhibited by the PIXL data, and if the +PIXL samples were good indicators of the surrounding geology. To do +this, I am used the PIXL data, along with the unscaled & clustered +LIBS data from before. I purposely left out Sample 1 of the PIXL data, +as it was an atmospheric sample and could potentially mess with +clustering. Also, I scaled the PIXL data in two ways; (1) +Scaled/normalized in relation to LIBS data and (2) Scaled using scale(). +The second method is exhibited below, the first will be shown later.
+libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
+pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds")
+Before I could do anything, I needed to clean and cluster the LIBS +data.
+libs.df <- libs.df %>%
+ select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+ MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+
+libs.df$point <- as.numeric(libs.df$point)
+
+# Make the a matrix contain only the libs measurements for each mineral
+libs.matrix <- as.matrix(libs.df[,6:13])
+
+set.seed(10)
+k <- 5
+km_reg <- kmeans(libs.matrix,k)
+cluster1 <- km_reg$cluster
+libs.reg.clustered <- cbind(libs.df,cluster1)
+Next, I needed to get the PIXL data ready by making sure it had the +same columns as the clustered LIBS data
+pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)],
+ as.factor)
+pixl.df <- pixl.df[2:16,] #Excluding first, atmospheric sample
+# Make the matrix of just mineral percentage measurements
+pixl.matrix <- pixl.df[,2:14]
+pixl.matrix.scaled <- pixl.df[,2:14] %>% scale()
+I start with scaling and normalizing the LIBS data, then perform the +same treatment to the PIXL data. I then give the LIBS data the same +clustering found from before, and classify the 15 PIXL samples using +KNN. I also perform k-means clustering on the pre-scaled PIXL data and +compare how the two methods differ in their PIXL results.
+# Prepare dataset for clustering selecting specific columns of interest and putting in a matrix
+pixl_trim <- pixl.df %>%
+ dplyr::select(c("Na20","Mgo","Al203","Si02", "K20","Cao","Ti02","FeO-T")) %>%
+ rename("Na2O"="Na20","MgO"="Mgo","Al2O3"="Al203","SiO2"="Si02","K2O"="K20",
+ "CaO"="Cao","TiO2"="Ti02","FeOT"="FeO-T")
+libs_trim <- libs.df %>%
+ dplyr::select(c("Na2O","MgO","Al2O3","SiO2", "K2O","CaO","TiO2", "FeOT"))
+
+# Normalize/scale training/test data
+scaler <- preProcess(libs_trim, method = c("center", "scale"))
+train <- predict(scaler, libs_trim)
+test.pixl <- predict(scaler, pixl_trim)
+
+# KNN model
+classtrain <- as.factor(libs.reg.clustered$cluster)
+train.df <- cbind(train,classtrain)
+model<- knn3(classtrain ~ ., data = train.df, k = 40)
+pixl.class <- predict(model,test.pixl, type="class")
+pixl.predicted <- cbind(pixl.df,pixl.class)
+#IMPORTANT: Use for heatmap
+pixl.classified.scaled <- cbind(test.pixl, pixl.class)
+
+# PIXL K-means
+pixl.matrix.scaled <- pixl.df %>% dplyr::select(c("Na20","Mgo","Al203","Si02",
+ "K20","Cao","Ti02","FeO-T")) %>% scale()
+set.seed(10)
+k <- 4
+km2 <- kmeans(pixl.matrix.scaled,k)
+cluster <- km2$cluster
+pixl.kmean <- cbind(pixl.df,cluster)
+
+# Heatmaps. Plenty of room to change/fix/adjust.
+pheatmap(km2$centers,scale="none", main="K-means Clustered PIXL Average Mineral Composition Weights")
+heatmap.data = data.frame(matrix(nrow = 0, ncol = ncol(km2$centers)))
+colnames(heatmap.data) = colnames(km2$centers)
+for (x in 1:6) {
+ test.df <- pixl.classified.scaled %>% filter(pixl.class == x)
+ if (dim(test.df)[1] != 0) {
+ test.df<- test.df[ , !(names(test.df) %in% c("pixl.class"))]
+ heatmap.data[nrow(heatmap.data)+1,] <- colMeans(test.df)
+ }
+}
+
+pheatmap(heatmap.data,scale="none", main="KNN Classified PIXL Average Mineral Composition Weights")
+I used 5 clusters when clustering the LIBS data, and used the same 5 +on the PIXL KNN classifcation. However, I noticed one of the clusters +completely disappeared, and only 4 clusters are highlighted. This is why +i made the decision to perform k-means on the PIXl data with k=4. In +comparing the two heatmaps, there are a lot of similarities. In both, +cluster 5 has the most traces of SiO2, and cluster 4 has the most traces +of FeO-T. There are other clusters that also line up with very minimal +traces of MgO and SiO2. The best overarching way to compare, I believe, +is to compare the column dendrograms. The big problem with doing this +analysis was that I included the insignificant cation data, which very +likely skewed my results. There are some other errors with my thought +process, and a little too much ambiguity to get a proper understanding +of anything. For this reason, I scrapped this idea and moved on.
+All in all, this finding was not very useful, and looking back at it, +somewhat confusing. I would not recommened doing this again, but if one +did, much more planning and understanding of the data is needed.
+citation("ggtern")
+## To cite ggtern in publications use:
+##
+## Hamilton NE, Ferry M (2018). "ggtern: Ternary Diagrams Using
+## ggplot2." _Journal of Statistical Software, Code Snippets_, *87*(3),
+## 1-17. doi:10.18637/jss.v087.c03
+## <https://doi.org/10.18637/jss.v087.c03>.
+##
+## A BibTeX entry for LaTeX users is
+##
+## @Article{,
+## title = {{ggtern}: Ternary Diagrams Using {ggplot2}},
+## author = {Nicholas E. Hamilton and Michael Ferry},
+## journal = {Journal of Statistical Software, Code Snippets},
+## year = {2018},
+## volume = {87},
+## number = {3},
+## pages = {1--17},
+## doi = {10.18637/jss.v087.c03},
+## }
+