diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd b/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd new file mode 100644 index 0000000..651c6b2 --- /dev/null +++ b/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd @@ -0,0 +1,688 @@ +--- +title: "Data Analytics Research Individual Final Project Report" +author: "Aadi Lahiri" +date: "Fall 2024" +output: + pdf_document: + toc: yes + toc_depth: '3' + html_notebook: default + html_document: + toc: yes + toc_depth: 3 + toc_float: yes + number_sections: yes + theme: united +--- + +# DAR Project and Group Members + +* Project name: Mars +* Project team members: Charlotte Peterson, Doña Roberts, David Walcyzk, Xuanting Wang, Ashton Compton, Margo VanEsselstyn, Nicolas Morawski, CJ Marino, Dante Mwatibo + + +# 0.0 Preliminaries. + +Executing this R notebook requires some subset of the following packages: + +* `ggplot2` +* `pandoc` +* `rmarkdown` +* `tidyverse` +* `stringr` +* `pheatmap` +* `caret` +* `knitr` +* `BBmisc` +* `ggtern` +* `glue` + +These will be installed and loaded as necessary (code suppressed). + +```{r, include=FALSE} + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} + +if (!require("caret")) { + install.packages("caret") + library(caret) +} + +if (!require("ggplot2")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("knitr")) { + install.packages("knitr") + library(knitr) +} + +if (!require("BBmisc")) { + install.packages("BBmisc") + library(BBmisc) +} + +if (!require("ggtern")) { + install.packages("ggtern") + library(ggtern) +} + +if (!require("glue")) { + install.packages("glue") + library(glue) +} + +``` + +# 1.0 Project Introduction + +In 2020, we sent Perserverance, a rover, to Mars to collect information about the planet. One of its sensors, the Supercam, collected information about the elemental composition of different samples, and this data is known as LIBS, or Laser Induced Breakdown Spectroscopy. This notebook aims to analyze this data to see what we can learn about the rocks on Mars and what that can tell us about the planet itself. + +# 2.0 Organization of Report + +This report is organize as follows: + + +* Section 3.0. Finding 1: Graphing the cation compositions of every LIBS sample. The cations were broken into 3 groups, Silicon and Aluminium, Iron and Magnesium, and lastly Calcium, Potassium, and Sodium. + + * Section 4.0: Finding 2: Comparing the Alkali and Silica compositions of each sample in order to classify Igneous rock type + +Repeat as necessary + +* Section 5.0 Finding 3: By scaling the LIBS samples by Earth reference data, we can see how LIBS samples on Mars compare to LIBS samples on Earth. + +* Section 6.0 Overall conclusions and suggestions + +* Section 7.0 Appendix This section describe the following additional works that may be helpful in the future work: Understand LIBS Elemental Composition. + + +# 3.0 Finding 1: Cation Analysis + +Each sample was given 3 features: the sum of its iron and magnesium, the sum of its silicon and aluminium, and the sum of its calcium, potassium, and sodium. We cluster the points by those three measurements, then graph the clusters into a ternary plot. By adding PIXL samples to this plot, we can tell where each rock type, igneous and sedimentary, lie in this plot and among the samples. + + +## 3.1 Data, Code, and Resources + +1. lahira-finalProjectF24.Rmd (with knit pdf and html) is this notebook. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08-FinalProjectNotebook/lahira-finalProjectF24.Rmd) + +2. v1_libs is the rds containing the LIBS data, with all Earth reference samples removed. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds). + +3. v1_pixl.Rds is the rds containing the PIXL data. It only contains the elemental compositions for each element. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds). + +4. v1_sample_meta.Rds is the rds containing the PIXL data. It only contains the elemental compositions for each element. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds). + + +Each sample in the LIBS and PIXL data has columns which are specific elemental compositions. We just have to add them together to get the setup for the Ternary plot. + +I also filter the LIBS data to get rid of all the Earth reference samples. The Earth reference samples were brought by the Perserverance rover for calibration of the Supercam during the journey, and as such those samples are not actually from Mars and shouldn't be included. + +We also drop the first PIXL sample, as it is an atmospheric sample. + + + +```{r } +# Data Processing for the Ternary Graph + +# Load the saved LIBS data with metadata added (libs without earth) +libswe <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds") + +#Sum the elements we are looking at +libswe <- libswe %>% + mutate(y = (FeOT + MgO) / 100, z = (CaO+Na2O+K2O) / 100, x = (SiO2 + Al2O3) / 100) %>% + select(c(x,y,z)) %>% + drop_na() + +# PIXL data added +pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds") +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) + +#adding PIXL metadata to get PIXL rock type for each sample +pixl_meta <- readRDS("~/DAR-Mars-F24/StudentData/v1_sample_meta.Rds") + +pixl.df <- pixl.df[2:16,] #Excluding first, atmospheric sample +pixl_meta <- pixl_meta[2:16,8] #Excluding first, atmospheric sample + +new_pixl_trim <- cbind(pixl.df,type = pixl_meta) + +#take the sums of the specific elements, and rename type column +pixl_ternary <- new_pixl_trim %>% + mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, PIXL_Rock_Type = type) %>% + select(c(x,y,z,PIXL_Rock_Type)) %>% + drop_na() + +#This is for the labels on the Ternary Plot below +pixl_ternary <- cbind(pixl_ternary, Sample_display= + c("2","3","4,6,7","5,8,9","","","","", + "10,11","","12,13","","14,15","","16")) + +``` + + +## 3.2 Contribution + +The code for this plot started with Dr. Erickson's graph, but then Nicolas Morwaski adapted it for his own use, and then I adapted Nicolas's code and turned it into what is displayed now. + +## 3.3 Methods Description + + +A ternary plot is a three sided plot whose elements should sum to 100. We use this plot to see how the different clusters of the LIBS samples fall with respect to the PIXL samples, which have already been labeled according to rock type. By analyzing the graph afterwards, we are able to make some classifications on the clusters. We are also looking for outliers and any groups of data which are different from the others. + + +## 3.4 Result and Discussion + +I start by clustering the samples by the three measurement we made in the data preparation section. We use kmeans clustering on three varibles, x,y, and z. X is the sum of the Silicon and Aluminium composition of a sample, y is the sum of Iron and Magnesium, and z is the sum of Calcium, Sodium, and Potassium. We use the specific combinations because they are grouped by cations, and their relationship is used in geological identification. We will use these clusters throughout this and other reports in order to learn more about the LIBS data. + + +```{r } +libs_ternplot <- libswe %>% select(c(x,y,z)) + +set.seed(1234) + +#kmeans on the original data +tern.km <- kmeans(libs_ternplot, centers=4) + +libs_ternplot <- cbind(libs_ternplot, LIBS_Cluster=as.factor(tern.km$cluster)) +``` + +After clustering the data, we add it to the data about to be graphed so we can see the clusters in the graph. + +```{r} +#ternary plot for LIBS data +ggtern(libs_ternplot, ggtern::aes(x=x, y=y, z=z, cluster=LIBS_Cluster)) + + #color by cluster + geom_point(aes(color=LIBS_Cluster), alpha = 0.5) + + theme_rgbw() + + theme(plot.caption = element_text(hjust = 0)) + # Increase plot margins + labs(title="Mars 2020 LIBS Ternary Plot with PIXL Data", + subtitle=stringr::str_wrap(glue("LIBS data Clustered by Cation Group with PIXL samples by Rock Type With Earth Reference Data Removed."), 60), + caption = stringr::str_wrap(glue("The LIBS samples are the colored circles, and the PIXL samples are the black circles and triangles."), 80), + x="Si+Al", + y="Fe+Mg", + z="Ca+Na+K") + + #suppress warnings here because of some warning with aes() + #add PIXL samples - atmospheric onto the ternary plot + suppressWarnings(geom_point( + data=pixl_ternary, ggtern::aes(x=x, y=y, z=z, + cluster=PIXL_Rock_Type, shape=PIXL_Rock_Type), + size = 2)) + + #Add labels to PIXL data corresponding to sample number + suppressWarnings(geom_text(data=pixl_ternary, + ggtern::aes(x=x, y=y, z=z, label=Sample_display, cluster=PIXL_Rock_Type, + hjust = ifelse(x > 0.43, 1, -0.1), # Horizontal adjust to avoid overlap + vjust = ifelse(x == 0.3668, 1.3, + ifelse(x == 0.375, 1, ifelse(x > 0.43, 1.5, -0.3))), + fontface="bold"), + size=2.7)) +``` + +From the ternary diagram, we see that most of the samples are high in both Fe+Mg and Al2, and low in Ca+Na2+K2. The samples that have higher Ca+Na2+K2 tend to have lower Fe+Mg and much lower Al2. From the clustering we see that cluster one tends to have high concentration of Si+Al2 and Fe+Mg, and very low concentrations of Ca+Na2+K2. Cluster 2 is mostly Fe+Mg, with a little Si+Al2. Cluster 1 is very diverse, but tends to be the samples with higher concentrations of Ca+Na2. Clusters 3 and 4 are similar, but cluster 3 generally has higher Si+Al than Cluster 4. + +Looking at the PIXL data, we see that sedimentary samples tend to be higher in Fe and Mg and lower in Si and Al, and are associated mostly with clusters 2,3 and 4. The igneous samples are the opposite, and appear in all clusters 1,3, and 4. We see that Cluster 2 and Cluster 1 are pretty dissimilar, while the opposite is true for clusters 3 and 4. The trend from Cluster 1 to Cluster 2 tells us that we see samples going from igneous rock to sedimentary rock, which could indicate a transition from igneous rock to sedimentary rock or vice versa. + +## 3.5 Conclusions, Limitations, and Future Work. + +After presenting this graph to Dr. Rogers, she said that Cluster 2 interested her the most, as it seems that that cluster seems to be out of the ordinary. What exactly was the reason that cluster was interesting I don't really know, but in the future perhaps analyzing that cluster could tell us something about Mars. + + + +# 4.0 Finding 2: Alkali Silica Analysis + +After producing the ternary graph, I was interested in learning more about each sample. I then decided to plot each sample on Total Alkali vs. Silica Plot for LIBS with Earth Reference Data Removed, to classify the samples by Igneous rock classification. + +## 4.1 Data, Code, and Resources + +1. lahira-finalProjectF24.Rmd (with knit pdf and html) is this notebook. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08-FinalProjectNotebook/lahira-finalProjectF24.Rmd) + +2. v1_libs is the rds containing the LIBS data, with all Earth reference samples removed. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds). + + +We are creating a total alkali silica plot from the same data as we used in Section 3. The x variable will be the sum of the alkali elements, Sodium and Potasisum, and the y variable will be silicon. We make sure to get rid of the Earth reference samples again. + + +```{r} +# Load the saved LIBS data with metadata added (libs without earth) +libswe <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds") + +#Data Processing for the Alk Sil Plot + +#Add the total alkali column (y) +libs_alksil <- libswe %>% + select(c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) %>% + mutate(y = Na2O + K2O) +``` + +## 4.2 Contribution + +All this code was written by me. I made the igneous rock classification diagram from https://www2.tulane.edu/~sanelson/eens212/igrockclassif.htm. + +## 4.3 Methods Description + +A Total Alkali Silica plot is used in different classifications for rock samples. I made this graph to see what sample were most likely igneous. + +## 4.4 Result and Discussion + +We graph the LIBS data points like any other scatter plot, but then I add line segments to help visually breakdown the classification. We reuse the clustering from the Ternary plot. + +```{r} +#CODE FOR TAS PLOT + +#Drop every column except Silicon and Alkali content +libs_alksil <- libs_alksil %>% + select(c(SiO2, y)) + +#Cluster them according to the kmeans from the Ternary plot +libs_alksil <- cbind(libs_alksil, cluster=as.factor(tern.km$cluster)) + +#Plot for original LIBS data +ggplot() + + geom_point(data = libs_alksil, + mapping = aes(x=SiO2, y=y, color=as.character(cluster)), + #add alpha so that labels on graph are visible + alpha = 0.3) + + #Add Line segments and labels for the igneous rocks reference + geom_segment(aes(x=41,y=0, xend=41, yend=7)) + + geom_segment(aes(x=45,y=0, xend=45, yend=5)) + + geom_segment(aes(x=52,y=0, xend=52, yend=5)) + + geom_segment(aes(x=57,y=1, xend=57, yend=6)) + + geom_segment(aes(x=53,y=9, xend=57, yend=6)) + + geom_segment(aes(x=48.5,y=11.5, xend=53, yend=9)) + + geom_segment(aes(x=63,y=2, xend=63, yend=7)) + + geom_segment(aes(x=52,y=5, xend=69, yend=8)) + + geom_segment(aes(x=69,y=8, xend=73, yend=3)) + + geom_segment(aes(x=69,y=8, xend=69, yend=13)) + + geom_segment(aes(x=41,y=3, xend=45, yend=3)) + + geom_segment(aes(x=45,y=5, xend=52, yend=5)) + + geom_segment(aes(x=45,y=5, xend=61, yend=13)) + + geom_segment(aes(x=49.3,y=7.2, xend=52, yend=5)) + + geom_segment(aes(x=45,y=9.4, xend=49.3, yend=7.2)) + + geom_segment(aes(x=41,y=7, xend=52.7, yend=14)) + + geom_segment(aes(x=58,y=11.6, xend=63, yend=7)) + + geom_segment(aes(x=58,y=11.6, xend=49, yend=15.5)) + + annotate("text",x=43,y=1.5,label="Picro-\nbasalt", size = 2) + + annotate("text",x=43.1,y=6.7,label="Tephrite", size = 2) + + annotate("text",x=43.3,y=5.7,label="Basanite", size = 2) + + annotate("text",x=48,y=3.5,label="Basalt", size = 2) + + annotate("text",x=39,y=10,label="Foidite", size = 2) + + annotate("text",x=42,y=14,label="Trachy-\nbasalt", size = 2) + + #to point at the correct area + geom_segment(aes(x=42,y=12.6, xend=49, yend=5.8), color="red") + + annotate("text",x=54.6,y=3.5,label="Basaltic\nandesite", size = 2) + + annotate("text",x=60,y=4.6,label="Andesite", size = 2) + + annotate("text",x=48.5,y=10,label="Phono-", size = 2) + + annotate("text",x=48.5,y=9,label="Tephrite", size = 2) + + annotate("text",x=47,y=18,label="Basaltic\ntrachy-\nandesite", size = 2) + + geom_segment(aes(x=47,y=15.5, xend=53, yend=6.5), color="red") + + annotate("text",x=53,y=11.8,label="Tephri-\nphonolite", size = 2) + + annotate("text",x=57.4,y=8.8,label="Trachy-\nandesite", size = 2) + + annotate("text",x=67,y=5,label="Dacite", size = 2) + + annotate("text",x=72,y=8.5,label="Rhyolite", size = 2) + + annotate("text",x=65,y=9,label="Trachydacite", size = 2) + + annotate("text",x=62.5,y=11.5,label="Trachydacite", size = 2) + + annotate("text",x=56.5,y=14.5,label="Phonolite", size = 2) + + theme_minimal() + + xlim(0,78) + + ylim(-1,20) + + labs(title = "Total Alkali vs. Silica Plot for LIBS \nWith Earth Reference Data Removed", + x = "Si", + y = "Na + K", + color="Cluster") + +``` + +While we see Clusters 1, 3 and 4 fall into different places in the igneous rock classification diagram, Cluster 2 is separated from the other three clusters in this diagram. Cluster 4 looks like it is mostly basalt and picrobasalt, while Clusters 1 and 3 fall into many different categories in this diargam. Cluster 1 is also interesting, as it seems really scattered, and some points in Cluster 1 do not even fall into the igneous rock classiciation diagram. I think this is because Cluster 1 contains some points which are outliers in the Ternary plot, such as the points in the bottom right corner of the Ternary plot. + +## 4.5 Conclusions, Limitations, and Future Work. + +Once again we see that Cluster 2 is different than the other clusters. When speaking with Dr. Rogers, she pointed out that a Total Alkali Silica plot with an igneous rock classification diagram will only make sense if all the sample are igneous rocks. We see that while we cannot confirm the rock types of all the clusters, we know that Cluster 2 is definitely not included with the other igneous rocks, which is supported by the fact that it is associated with the sedimentary PIXl sample from the Ternary plot. I think further analysis on Cluster 2 could tell us more about it. + + +# 5.0 Finding 3: Earth Scaled LIBS + +With the LIBS samples, we were also given the Earth reference data in terms of quartiles for each element. I am going to scale the LIBS samples by the quartile information to see how Mars LIBS data would compare to Earth LIBS data, to see what exactly is different between them. + +## 5.1 Data, Code, and Resources + +1. lahira-finalProjectF24.Rmd (with knit pdf and html) is this notebook. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08-FinalProjectNotebook/lahira-finalProjectF24.Rmd) + +2. v1_libs is the rds containing the LIBS data, with all Earth reference samples removed. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds). + +3. v1_pixl.Rds is the rds containing the PIXL data. It only contains the elemental compositions for each element. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds). + +4. LIBS_training_set_quartiles.Rds is the rds containing the LIBS earth reference data. +[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/LIBS_training_set_quartiles.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/LIBS_training_set_quartiles.Rds). + +We are comparing the difference between the normal LIBS data and the scaled libs data. We scale the libs data by starting with the Earth Reference data, which gives median and quartile data for elemental composition on Earth. By scaling the Mars LIBS data by the Earth element composition quartile data, we will see how the Mars LIBS data compares to rock samples on Earth. + +While we are going to Earth Scale the LIBS data, we are also going to Earth Scale the PIXL data just to compare the two and see what we can learn. We will also plot the original data for reference. + +We continue to remove the Earth reference samples from the LIBS dataset. + +```{r} +# Load the saved LIBS data with metadata added (libs without earth) +libs.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds") + +libs_earth <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/LIBS_training_set_quartiles.Rds") + +libs_trim <- libs.df %>% select(c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) %>% + rowwise() %>% mutate(Si= (SiO2-libs_earth[3,2])/(libs_earth[4,2] - libs_earth[2,2]), + Ti= (TiO2-libs_earth[3,3])/(libs_earth[4,3] - libs_earth[2,3]), + Al= (Al2O3-libs_earth[3,4])/(libs_earth[4,4] - libs_earth[2,4]), + Fe= (FeOT-libs_earth[3,5])/(libs_earth[4,5] - libs_earth[2,5]), + Mg= (MgO-libs_earth[3,6])/(libs_earth[4,6] - libs_earth[2,6]), + Ca= (CaO-libs_earth[3,7])/(libs_earth[4,7] - libs_earth[2,7]), + Na= (Na2O-libs_earth[3,8])/(libs_earth[4,8] - libs_earth[2,8]), + K= (K2O-libs_earth[3,9])/(libs_earth[4,9] - libs_earth[2,9])) %>% + select(!c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) + +# PIXL data added +pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds") +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) +pixl.df <- pixl.df[2:16,] #Excluding first, atmospheric sample + +new_pixl_trim <- pixl.df + +new_pixl_trim_we <- pixl.df %>% select(c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) %>% + rowwise() %>% mutate(Si= (SiO2-libs_earth[3,2])/(libs_earth[4,2] - libs_earth[2,2]), + Ti= (TiO2-libs_earth[3,3])/(libs_earth[4,3] - libs_earth[2,3]), + Al= (Al2O3-libs_earth[3,4])/(libs_earth[4,4] - libs_earth[2,4]), + Fe= (FeOT-libs_earth[3,5])/(libs_earth[4,5] - libs_earth[2,5]), + Mg= (MgO-libs_earth[3,6])/(libs_earth[4,6] - libs_earth[2,6]), + Ca= (CaO-libs_earth[3,7])/(libs_earth[4,7] - libs_earth[2,7]), + Na= (Na2O-libs_earth[3,8])/(libs_earth[4,8] - libs_earth[2,8]), + K= (K2O-libs_earth[3,9])/(libs_earth[4,9] - libs_earth[2,9])) %>% + select(!c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) + +``` + + +## 5.2 Contribution + +I wrote all the code for this section. + +## 5.3 Methods Description + +We will start basic with the analysis, then move to more complex methods. We start with a boxplot just to examine what features are different compared to the Earth reference samples. We also create a density map, just to add to the analysis of the boxplots. We then make a heatmap of the scaled data, and run pca and graph the biplot of the pca. + +## 5.4 Result and Discussion + +We start with a boxplot of the unscaled LIBS and PIXL data. We graph them side by side for comparison. + +```{r} +ggplot() + + # Boxplots for Si + geom_boxplot(aes( + x = factor(paste("Si", c(rep("LIBS", length(libs.df$SiO2)), rep("PIXL", length(new_pixl_trim$SiO2)))), + levels = c("Si LIBS", "Si PIXL")), + y = c(libs.df$SiO2, new_pixl_trim$SiO2), + fill = c(rep("LIBS", length(libs.df$SiO2)), rep("PIXL", length(new_pixl_trim$SiO2))) + )) + + + # Boxplots for Fe + geom_boxplot(aes( + x = factor(paste("Fe", c(rep("LIBS", length(libs.df$FeOT)), rep("PIXL", length(new_pixl_trim$FeOT)))), + levels = c("Fe LIBS", "Fe PIXL")), + y = c(libs.df$FeOT, new_pixl_trim$FeOT), + fill = c(rep("LIBS", length(libs.df$FeOT)), rep("PIXL", length(new_pixl_trim$FeOT))) + )) + + + # Boxplots for Mg + geom_boxplot(aes( + x = factor(paste("Mg", c(rep("LIBS", length(libs.df$MgO)), rep("PIXL", length(new_pixl_trim$MgO)))), + levels = c("Mg LIBS", "Mg PIXL")), + y = c(libs.df$MgO, new_pixl_trim$MgO), + fill = c(rep("LIBS", length(libs.df$MgO)), rep("PIXL", length(new_pixl_trim$MgO))) + )) + + + # Boxplots for Al + geom_boxplot(aes( + x = factor(paste("Al", c(rep("LIBS", length(libs.df$Al2O3)), rep("PIXL", length(new_pixl_trim$Al2O3)))), + levels = c("Al LIBS", "Al PIXL")), + y = c(libs.df$Al2O3, new_pixl_trim$Al2O3), + fill = c(rep("LIBS", length(libs.df$Al2O3)), rep("PIXL", length(new_pixl_trim$Al2O3))) + )) + + + # Boxplots for Ca + geom_boxplot(aes( + x = factor(paste("Ca", c(rep("LIBS", length(libs.df$CaO)), rep("PIXL", length(new_pixl_trim$CaO)))), + levels = c("Ca LIBS", "Ca PIXL")), + y = c(libs.df$CaO, new_pixl_trim$CaO), + fill = c(rep("LIBS", length(libs.df$CaO)), rep("PIXL", length(new_pixl_trim$CaO))) + )) + + + # Boxplots for Na + geom_boxplot(aes( + x = factor(paste("Na", c(rep("LIBS", length(libs.df$Na2O)), rep("PIXL", length(new_pixl_trim$Na2O)))), + levels = c("Na LIBS", "Na PIXL")), + y = c(libs.df$Na2O, new_pixl_trim$Na2O), + fill = c(rep("LIBS", length(libs.df$Na2O)), rep("PIXL", length(new_pixl_trim$Na2O))) + )) + + + # Boxplots for K + geom_boxplot(aes( + x = factor(paste("K", c(rep("LIBS", length(libs.df$K2O)), rep("PIXL", length(new_pixl_trim$K2O)))), + levels = c("K LIBS", "K PIXL")), + y = c(libs.df$K2O, new_pixl_trim$K2O), + fill = c(rep("LIBS", length(libs.df$K2O)), rep("PIXL", length(new_pixl_trim$K2O))) + )) + + + # Boxplots for Ti + geom_boxplot(aes( + x = factor(paste("Ti", c(rep("LIBS", length(libs.df$TiO2)), rep("PIXL", length(new_pixl_trim$TiO2)))), + levels = c("Ti LIBS", "Ti PIXL")), + y = c(libs.df$TiO2, new_pixl_trim$TiO2), + fill = c(rep("LIBS", length(libs.df$TiO2)), rep("PIXL", length(new_pixl_trim$TiO2))) + )) + + + # Labels and theme + labs( + title = "Comparison of Unscaled LIBS and PIXL Data", + x = "Element (Source)", + y = "wt. (%)", + fill = "Source" + ) + + scale_fill_manual(values = c("LIBS" = "lightblue", "PIXL" = "lightcoral")) + + scale_x_discrete(labels = c( + "Si LIBS" = " Si", "Si PIXL" = "", + "Ti LIBS" = " Ti", "Ti PIXL" = "", + "Al LIBS" = " Al", "Al PIXL" = "", + "Fe LIBS" = " Fe", "Fe PIXL" = "", + "Mg LIBS" = " Mg", "Mg PIXL" = "", + "Ca LIBS" = " Ca", "Ca PIXL" = "", + "Na LIBS" = " Na", "Na PIXL" = "", + "K LIBS" = " K", "K PIXL" = "" + )) + + theme_minimal() +``` + +Now we plot the scaled LIBS and PIXL data. + +```{r} +ggplot() + + # Boxplots for Mg + geom_boxplot(aes( + x = factor(paste("Mg", c(rep("LIBS", length(libs_trim$Mg$MgO)), rep("PIXL", length(new_pixl_trim_we$Mg$MgO)))), + levels = c("Mg LIBS", "Mg PIXL")), + y = c(libs_trim$Mg$MgO, new_pixl_trim_we$Mg$MgO), + fill = c(rep("LIBS", length(libs_trim$Mg$MgO)), rep("PIXL", length(new_pixl_trim_we$Mg$MgO))) + )) + + + # Boxplots for Fe + geom_boxplot(aes( + x = factor(paste("Fe", c(rep("LIBS", length(libs_trim$Fe$FeOT)), rep("PIXL", length(new_pixl_trim_we$Fe$FeOT)))), + levels = c("Fe LIBS", "Fe PIXL")), + y = c(libs_trim$Fe$FeOT, new_pixl_trim_we$Fe$FeOT), + fill = c(rep("LIBS", length(libs_trim$Fe$FeOT)), rep("PIXL", length(new_pixl_trim_we$Fe$FeOT))) + )) + + + # Boxplots for Ca + geom_boxplot(aes( + x = factor(paste("Ca", c(rep("LIBS", length(libs_trim$Ca$CaO)), rep("PIXL", length(new_pixl_trim_we$Ca$CaO)))), + levels = c("Ca LIBS", "Ca PIXL")), + y = c(libs_trim$Ca$CaO, new_pixl_trim_we$Ca$CaO), + fill = c(rep("LIBS", length(libs_trim$Ca$CaO)), rep("PIXL", length(new_pixl_trim_we$Ca$CaO))) + )) + + + # Boxplots for Na + geom_boxplot(aes( + x = factor(paste("Na", c(rep("LIBS", length(libs_trim$Na$Na2O)), rep("PIXL", length(new_pixl_trim_we$Na$Na2O)))), + levels = c("Na LIBS", "Na PIXL")), + y = c(libs_trim$Na$Na2O, new_pixl_trim_we$Na$Na2O), + fill = c(rep("LIBS", length(libs_trim$Na$Na2O)), rep("PIXL", length(new_pixl_trim_we$Na$Na2O))) + )) + + + # Boxplots for Ti + geom_boxplot(aes( + x = factor(paste("Ti", c(rep("LIBS", length(libs_trim$Ti$TiO2)), rep("PIXL", length(new_pixl_trim_we$Ti$TiO2)))), + levels = c("Ti LIBS", "Ti PIXL")), + y = c(libs_trim$Ti$TiO2, new_pixl_trim_we$Ti$TiO2), + fill = c(rep("LIBS", length(libs_trim$Ti$TiO2)), rep("PIXL", length(new_pixl_trim_we$Ti$TiO2))) + )) + + + # Boxplots for K + geom_boxplot(aes( + x = factor(paste("K", c(rep("LIBS", length(libs_trim$K$K2O)), rep("PIXL", length(new_pixl_trim_we$K$K2O)))), + levels = c("K LIBS", "K PIXL")), + y = c(libs_trim$K$K2O, new_pixl_trim_we$K$K2O), + fill = c(rep("LIBS", length(libs_trim$K$K2O)), rep("PIXL", length(new_pixl_trim_we$K$K2O))) + )) + + + # Boxplots for Si + geom_boxplot(aes( + x = factor(paste("Si", c(rep("LIBS", length(libs_trim$Si$SiO2)), rep("PIXL", length(new_pixl_trim_we$Si$SiO2)))), + levels = c("Si LIBS", "Si PIXL")), + y = c(libs_trim$Si$SiO2, new_pixl_trim_we$Si$SiO2), + fill = c(rep("LIBS", length(libs_trim$Si$SiO2)), rep("PIXL", length(new_pixl_trim_we$Si$SiO2))) + )) + + + # Boxplots for Al + geom_boxplot(aes( + x = factor(paste("Al", c(rep("LIBS", length(libs_trim$Al$Al2O3)), rep("PIXL", length(new_pixl_trim_we$Al$Al2O3)))), + levels = c("Al LIBS", "Al PIXL")), + y = c(libs_trim$Al$Al2O3, new_pixl_trim_we$Al$Al2O3), + fill = c(rep("LIBS", length(libs_trim$Al$Al2O3)), rep("PIXL", length(new_pixl_trim_we$Al$Al2O3))) + )) + + + # Labels and theme + labs( + title = "Comparison of Earth Scaled LIBS and Earth Scaled PIXL Data", + caption = "LIBS and PIXL samples scaled by the Earth reference data. \nThe dashed lines represent the median and the 3rd and 1st.", + x = "Element (Source)", + y = "Scaled Value", + fill = "Source" + ) + + scale_fill_manual(values = c("LIBS" = "lightblue", "PIXL" = "lightcoral")) + + scale_x_discrete(labels = c( + "Si LIBS" = " Si", "Si PIXL" = "", + "Ti LIBS" = " Ti", "Ti PIXL" = "", + "Al LIBS" = " Al", "Al PIXL" = "", + "Fe LIBS" = " Fe", "Fe PIXL" = "", + "Mg LIBS" = " Mg", "Mg PIXL" = "", + "Ca LIBS" = " Ca", "Ca PIXL" = "", + "Na LIBS" = " Na", "Na PIXL" = "", + "K LIBS" = " K", "K PIXL" = "" + )) + + theme_minimal() + + geom_hline(yintercept = c(-1, 0, 1), linetype = "dashed", color = "red") + + theme(plot.caption = element_text(hjust = 0)) + +``` + +We immediately notice that Iron and Magneisum have the greatest scaled range, which means they may be indicators for this dataset. We continue with a density plot to see if this observation is repeated. + +```{r} +ggplot() + + geom_density(aes(x = libs_trim$Si$SiO2, color = "Si"), fill = "#1f78b4", alpha = 0.3) + # Blue + geom_density(aes(x = libs_trim$Ti$TiO2, color = "Ti"), fill = "#e31a1c", alpha = 0.3) + # Red + geom_density(aes(x = libs_trim$Al$Al2O3, color = "Al"), fill = "#33a02c", alpha = 0.3) + # Green + geom_density(aes(x = libs_trim$Fe$FeOT, color = "Fe"), fill = "#ff7f00", alpha = 0.3) + # Orange + geom_density(aes(x = libs_trim$Mg$MgO, color = "Mg"), fill = "#6a3d9a", alpha = 0.3) + # Purple + geom_density(aes(x = libs_trim$Ca$CaO, color = "Ca"), fill = "#b15928", alpha = 0.3) + # Brown + geom_density(aes(x = libs_trim$Na$Na2O, color = "Na"), fill = "#a6cee3", alpha = 0.3) + # Light Blue + geom_density(aes(x = libs_trim$K$K2O, color = "K"), fill = "#fb9a99", alpha = 0.3) + # Pink + labs(title = "Earth Scaled LIBS by Elemental Composition", x = "Value", y = "Density") + + scale_color_manual(name = "Elements", + values = c("Si" = "#1f78b4", "Ti" = "#e31a1c", "Al" = "#33a02c", + "Fe" = "#ff7f00", "Mg" = "#6a3d9a", "Ca" = "#b15928", + "Na" = "#a6cee3", "K" = "#fb9a99")) + + theme_minimal() +``` + +Once again from the density plot we see that Magnesium and Iron have the greatest range of all the elements. We also notice that Potassium has an extremely high density, and thus could potentially be removed from the data set. + + +```{r} +set.seed(1234) +km <- kmeans(libs_trim,4) + +cluster.df<-data.frame(cluster= 1:4, size=km$size) +kable(cluster.df,caption="Samples per cluster") + +pheatmap(km$centers,scale="none", main="Clusters by Element Composition on Earth Scaled LIBS") +``` + +We see that there is a good distribution of samples in each cluster, which tells us no cluster is under/over represented. From the heatmap, we see once again that Iron and Magnesium are the biggest indicators of each cluster. + +```{r} +pca_libs <- prcomp(libs_trim, scale=FALSE) + +ggbiplot::ggbiplot(pca_libs, + groups = as.factor(km$cluster)) + + labs(title="Biplot of PCA on Earth Scaled LIBS") +``` + +The biplot just confirms what we saw in the heatmap, that some clusters are associated with Magnesium, some with Iron, and some both. We also see that Cluster 4 is different than the other clusters. + +## 5.5 Conclusions, Limitations, and Future Work. + +Iron and magneisum are the biggest indicators of this dataset, as shown from the analysis. I would reccomend that future analysis of LIBS focus on those two elements, or at the very least include those two elements. I would also like to do some research on the significance of the presence of iron and magnesium in rock samples. + +# Bibliography + +* https://www.mdpi.com/1996-1944/16/20/6641 [Królicka23] +* https://science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/ [NASA21] +* https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2023EA002829 [Veneranda23] +* https://www.mdpi.com/2072-4292/13/23/4773 [Liu21] + +* ggtern for the ternary diagram + + +# Appendix + +I wanted to investigate why the elemental compositions of each sample did not add up to 100, meaning that 100% of the rock has been identified. For some samples, the sum of the elemental compositions can be in the low 80s or in the 120s. This variance seems to throw the accuracy of all the measurements into question, so I wanted to see why exactly this occurs. + +In the paper Application of Laser-Induced Breakdown Spectroscopy for Depth Profiling of Multilayer and Graded Materials (https://www.mdpi.com/1996-1944/16/20/6641), in section 5 the authors discuss how LIBS data can be hard to analyze in situations where traditional calibration techniques cannot be used, such as remote sensing in space missions, which is the exact situation the data is collected from. In this situation, a calibration free method is used to determine elemental composition, which necessitates several specific criteria. The problem is that with matrix effects, laser parameters, and experimental configurations, it is hard to be completely accurate with analyzing concentrations. + +Some of the difficulty also comes from the fact that the LIBS system itself is not the same every time. From Post-landing Major Element Quantification Using SuperCam Laser Induced Breakdown Spectroscopy (https://www.sciencedirect.com/science/article/pii/S0584854721003049), it is revealed that the LIBS laser is not exactly the same every time it is fired. There can be changes with the size of the ablation, which is essentially the mini crater left behind by the laser. The temperature of the plasma in the laser can vary as well, varying from fire to fire of the laser. These changes can affect the calibration free model for LIBS elemental analysis, which can cuase the issues with total elemental composition as we see. + diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.pdf b/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.pdf new file mode 100644 index 0000000..bfb0caf Binary files /dev/null and b/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.pdf differ