Skip to content
Permalink
4d7fc8b600
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
@lahira
Latest commit 4d7fc8b Dec 14, 2024 History
1 contributor

Users who have contributed to this file

688 lines (521 sloc) 34 KB
---
title: "Data Analytics Research Individual Final Project Report"
author: "Aadi Lahiri"
date: "Fall 2024"
output:
pdf_document:
toc: yes
toc_depth: '3'
html_notebook: default
html_document:
toc: yes
toc_depth: 3
toc_float: yes
number_sections: yes
theme: united
---
# DAR Project and Group Members
* Project name: Mars
* Project team members: Charlotte Peterson, Doña Roberts, David Walcyzk, Xuanting Wang, Ashton Compton, Margo VanEsselstyn, Nicolas Morawski, CJ Marino, Dante Mwatibo
# 0.0 Preliminaries.
Executing this R notebook requires some subset of the following packages:
* `ggplot2`
* `pandoc`
* `rmarkdown`
* `tidyverse`
* `stringr`
* `pheatmap`
* `caret`
* `knitr`
* `BBmisc`
* `ggtern`
* `glue`
These will be installed and loaded as necessary (code suppressed).
```{r, include=FALSE}
if (!require("pandoc")) {
install.packages("pandoc")
library(pandoc)
}
if (!require("rmarkdown")) {
install.packages("rmarkdown")
library(rmarkdown)
}
if (!require("tidyverse")) {
install.packages("tidyverse")
library(tidyverse)
}
if (!require("stringr")) {
install.packages("stringr")
library(stringr)
}
if (!require("pheatmap")) {
install.packages("pheatmap")
library(pheatmap)
}
if (!require("caret")) {
install.packages("caret")
library(caret)
}
if (!require("ggplot2")) {
install.packages("ggbiplot")
library(ggbiplot)
}
if (!require("knitr")) {
install.packages("knitr")
library(knitr)
}
if (!require("BBmisc")) {
install.packages("BBmisc")
library(BBmisc)
}
if (!require("ggtern")) {
install.packages("ggtern")
library(ggtern)
}
if (!require("glue")) {
install.packages("glue")
library(glue)
}
```
# 1.0 Project Introduction
In 2020, we sent Perserverance, a rover, to Mars to collect information about the planet. One of its sensors, the Supercam, collected information about the elemental composition of different samples, and this data is known as LIBS, or Laser Induced Breakdown Spectroscopy. This notebook aims to analyze this data to see what we can learn about the rocks on Mars and what that can tell us about the planet itself.
# 2.0 Organization of Report
This report is organize as follows:
* Section 3.0. Finding 1: Graphing the cation compositions of every LIBS sample. The cations were broken into 3 groups, Silicon and Aluminium, Iron and Magnesium, and lastly Calcium, Potassium, and Sodium.
* Section 4.0: Finding 2: Comparing the Alkali and Silica compositions of each sample in order to classify Igneous rock type
Repeat as necessary
* Section 5.0 Finding 3: By scaling the LIBS samples by Earth reference data, we can see how LIBS samples on Mars compare to LIBS samples on Earth.
* Section 6.0 Overall conclusions and suggestions
* Section 7.0 Appendix This section describe the following additional works that may be helpful in the future work: Understand LIBS Elemental Composition.
# 3.0 Finding 1: Cation Analysis
Each sample was given 3 features: the sum of its iron and magnesium, the sum of its silicon and aluminium, and the sum of its calcium, potassium, and sodium. We cluster the points by those three measurements, then graph the clusters into a ternary plot. By adding PIXL samples to this plot, we can tell where each rock type, igneous and sedimentary, lie in this plot and among the samples.
## 3.1 Data, Code, and Resources
1. lahira-finalProjectF24.Rmd (with knit pdf and html) is this notebook.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08-FinalProjectNotebook/lahira-finalProjectF24.Rmd)
2. v1_libs is the rds containing the LIBS data, with all Earth reference samples removed.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds).
3. v1_pixl.Rds is the rds containing the PIXL data. It only contains the elemental compositions for each element.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds).
4. v1_sample_meta.Rds is the rds containing the PIXL data. It only contains the elemental compositions for each element.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds).
Each sample in the LIBS and PIXL data has columns which are specific elemental compositions. We just have to add them together to get the setup for the Ternary plot.
I also filter the LIBS data to get rid of all the Earth reference samples. The Earth reference samples were brought by the Perserverance rover for calibration of the Supercam during the journey, and as such those samples are not actually from Mars and shouldn't be included.
We also drop the first PIXL sample, as it is an atmospheric sample.
```{r }
# Data Processing for the Ternary Graph
# Load the saved LIBS data with metadata added (libs without earth)
libswe <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds")
#Sum the elements we are looking at
libswe <- libswe %>%
mutate(y = (FeOT + MgO) / 100, z = (CaO+Na2O+K2O) / 100, x = (SiO2 + Al2O3) / 100) %>%
select(c(x,y,z)) %>%
drop_na()
# PIXL data added
pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds")
pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)],
as.factor)
#adding PIXL metadata to get PIXL rock type for each sample
pixl_meta <- readRDS("~/DAR-Mars-F24/StudentData/v1_sample_meta.Rds")
pixl.df <- pixl.df[2:16,] #Excluding first, atmospheric sample
pixl_meta <- pixl_meta[2:16,8] #Excluding first, atmospheric sample
new_pixl_trim <- cbind(pixl.df,type = pixl_meta)
#take the sums of the specific elements, and rename type column
pixl_ternary <- new_pixl_trim %>%
mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, PIXL_Rock_Type = type) %>%
select(c(x,y,z,PIXL_Rock_Type)) %>%
drop_na()
#This is for the labels on the Ternary Plot below
pixl_ternary <- cbind(pixl_ternary, Sample_display=
c("2","3","4,6,7","5,8,9","","","","",
"10,11","","12,13","","14,15","","16"))
```
## 3.2 Contribution
The code for this plot started with Dr. Erickson's graph, but then Nicolas Morwaski adapted it for his own use, and then I adapted Nicolas's code and turned it into what is displayed now.
## 3.3 Methods Description
A ternary plot is a three sided plot whose elements should sum to 100. We use this plot to see how the different clusters of the LIBS samples fall with respect to the PIXL samples, which have already been labeled according to rock type. By analyzing the graph afterwards, we are able to make some classifications on the clusters. We are also looking for outliers and any groups of data which are different from the others.
## 3.4 Result and Discussion
I start by clustering the samples by the three measurement we made in the data preparation section. We use kmeans clustering on three varibles, x,y, and z. X is the sum of the Silicon and Aluminium composition of a sample, y is the sum of Iron and Magnesium, and z is the sum of Calcium, Sodium, and Potassium. We use the specific combinations because they are grouped by cations, and their relationship is used in geological identification. We will use these clusters throughout this and other reports in order to learn more about the LIBS data.
```{r }
libs_ternplot <- libswe %>% select(c(x,y,z))
set.seed(1234)
#kmeans on the original data
tern.km <- kmeans(libs_ternplot, centers=4)
libs_ternplot <- cbind(libs_ternplot, LIBS_Cluster=as.factor(tern.km$cluster))
```
After clustering the data, we add it to the data about to be graphed so we can see the clusters in the graph.
```{r}
#ternary plot for LIBS data
ggtern(libs_ternplot, ggtern::aes(x=x, y=y, z=z, cluster=LIBS_Cluster)) +
#color by cluster
geom_point(aes(color=LIBS_Cluster), alpha = 0.5) +
theme_rgbw() +
theme(plot.caption = element_text(hjust = 0)) + # Increase plot margins
labs(title="Mars 2020 LIBS Ternary Plot with PIXL Data",
subtitle=stringr::str_wrap(glue("LIBS data Clustered by Cation Group with PIXL samples by Rock Type With Earth Reference Data Removed."), 60),
caption = stringr::str_wrap(glue("The LIBS samples are the colored circles, and the PIXL samples are the black circles and triangles."), 80),
x="Si+Al",
y="Fe+Mg",
z="Ca+Na+K") +
#suppress warnings here because of some warning with aes()
#add PIXL samples - atmospheric onto the ternary plot
suppressWarnings(geom_point(
data=pixl_ternary, ggtern::aes(x=x, y=y, z=z,
cluster=PIXL_Rock_Type, shape=PIXL_Rock_Type),
size = 2)) +
#Add labels to PIXL data corresponding to sample number
suppressWarnings(geom_text(data=pixl_ternary,
ggtern::aes(x=x, y=y, z=z, label=Sample_display, cluster=PIXL_Rock_Type,
hjust = ifelse(x > 0.43, 1, -0.1), # Horizontal adjust to avoid overlap
vjust = ifelse(x == 0.3668, 1.3,
ifelse(x == 0.375, 1, ifelse(x > 0.43, 1.5, -0.3))),
fontface="bold"),
size=2.7))
```
From the ternary diagram, we see that most of the samples are high in both Fe+Mg and Al2, and low in Ca+Na2+K2. The samples that have higher Ca+Na2+K2 tend to have lower Fe+Mg and much lower Al2. From the clustering we see that cluster one tends to have high concentration of Si+Al2 and Fe+Mg, and very low concentrations of Ca+Na2+K2. Cluster 2 is mostly Fe+Mg, with a little Si+Al2. Cluster 1 is very diverse, but tends to be the samples with higher concentrations of Ca+Na2. Clusters 3 and 4 are similar, but cluster 3 generally has higher Si+Al than Cluster 4.
Looking at the PIXL data, we see that sedimentary samples tend to be higher in Fe and Mg and lower in Si and Al, and are associated mostly with clusters 2,3 and 4. The igneous samples are the opposite, and appear in all clusters 1,3, and 4. We see that Cluster 2 and Cluster 1 are pretty dissimilar, while the opposite is true for clusters 3 and 4. The trend from Cluster 1 to Cluster 2 tells us that we see samples going from igneous rock to sedimentary rock, which could indicate a transition from igneous rock to sedimentary rock or vice versa.
## 3.5 Conclusions, Limitations, and Future Work.
After presenting this graph to Dr. Rogers, she said that Cluster 2 interested her the most, as it seems that that cluster seems to be out of the ordinary. What exactly was the reason that cluster was interesting I don't really know, but in the future perhaps analyzing that cluster could tell us something about Mars.
# 4.0 Finding 2: Alkali Silica Analysis
After producing the ternary graph, I was interested in learning more about each sample. I then decided to plot each sample on Total Alkali vs. Silica Plot for LIBS with Earth Reference Data Removed, to classify the samples by Igneous rock classification.
## 4.1 Data, Code, and Resources
1. lahira-finalProjectF24.Rmd (with knit pdf and html) is this notebook.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08-FinalProjectNotebook/lahira-finalProjectF24.Rmd)
2. v1_libs is the rds containing the LIBS data, with all Earth reference samples removed.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds).
We are creating a total alkali silica plot from the same data as we used in Section 3. The x variable will be the sum of the alkali elements, Sodium and Potasisum, and the y variable will be silicon. We make sure to get rid of the Earth reference samples again.
```{r}
# Load the saved LIBS data with metadata added (libs without earth)
libswe <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds")
#Data Processing for the Alk Sil Plot
#Add the total alkali column (y)
libs_alksil <- libswe %>%
select(c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) %>%
mutate(y = Na2O + K2O)
```
## 4.2 Contribution
All this code was written by me. I made the igneous rock classification diagram from https://www2.tulane.edu/~sanelson/eens212/igrockclassif.htm.
## 4.3 Methods Description
A Total Alkali Silica plot is used in different classifications for rock samples. I made this graph to see what sample were most likely igneous.
## 4.4 Result and Discussion
We graph the LIBS data points like any other scatter plot, but then I add line segments to help visually breakdown the classification. We reuse the clustering from the Ternary plot.
```{r}
#CODE FOR TAS PLOT
#Drop every column except Silicon and Alkali content
libs_alksil <- libs_alksil %>%
select(c(SiO2, y))
#Cluster them according to the kmeans from the Ternary plot
libs_alksil <- cbind(libs_alksil, cluster=as.factor(tern.km$cluster))
#Plot for original LIBS data
ggplot() +
geom_point(data = libs_alksil,
mapping = aes(x=SiO2, y=y, color=as.character(cluster)),
#add alpha so that labels on graph are visible
alpha = 0.3) +
#Add Line segments and labels for the igneous rocks reference
geom_segment(aes(x=41,y=0, xend=41, yend=7)) +
geom_segment(aes(x=45,y=0, xend=45, yend=5)) +
geom_segment(aes(x=52,y=0, xend=52, yend=5)) +
geom_segment(aes(x=57,y=1, xend=57, yend=6)) +
geom_segment(aes(x=53,y=9, xend=57, yend=6)) +
geom_segment(aes(x=48.5,y=11.5, xend=53, yend=9)) +
geom_segment(aes(x=63,y=2, xend=63, yend=7)) +
geom_segment(aes(x=52,y=5, xend=69, yend=8)) +
geom_segment(aes(x=69,y=8, xend=73, yend=3)) +
geom_segment(aes(x=69,y=8, xend=69, yend=13)) +
geom_segment(aes(x=41,y=3, xend=45, yend=3)) +
geom_segment(aes(x=45,y=5, xend=52, yend=5)) +
geom_segment(aes(x=45,y=5, xend=61, yend=13)) +
geom_segment(aes(x=49.3,y=7.2, xend=52, yend=5)) +
geom_segment(aes(x=45,y=9.4, xend=49.3, yend=7.2)) +
geom_segment(aes(x=41,y=7, xend=52.7, yend=14)) +
geom_segment(aes(x=58,y=11.6, xend=63, yend=7)) +
geom_segment(aes(x=58,y=11.6, xend=49, yend=15.5)) +
annotate("text",x=43,y=1.5,label="Picro-\nbasalt", size = 2) +
annotate("text",x=43.1,y=6.7,label="Tephrite", size = 2) +
annotate("text",x=43.3,y=5.7,label="Basanite", size = 2) +
annotate("text",x=48,y=3.5,label="Basalt", size = 2) +
annotate("text",x=39,y=10,label="Foidite", size = 2) +
annotate("text",x=42,y=14,label="Trachy-\nbasalt", size = 2) +
#to point at the correct area
geom_segment(aes(x=42,y=12.6, xend=49, yend=5.8), color="red") +
annotate("text",x=54.6,y=3.5,label="Basaltic\nandesite", size = 2) +
annotate("text",x=60,y=4.6,label="Andesite", size = 2) +
annotate("text",x=48.5,y=10,label="Phono-", size = 2) +
annotate("text",x=48.5,y=9,label="Tephrite", size = 2) +
annotate("text",x=47,y=18,label="Basaltic\ntrachy-\nandesite", size = 2) +
geom_segment(aes(x=47,y=15.5, xend=53, yend=6.5), color="red") +
annotate("text",x=53,y=11.8,label="Tephri-\nphonolite", size = 2) +
annotate("text",x=57.4,y=8.8,label="Trachy-\nandesite", size = 2) +
annotate("text",x=67,y=5,label="Dacite", size = 2) +
annotate("text",x=72,y=8.5,label="Rhyolite", size = 2) +
annotate("text",x=65,y=9,label="Trachydacite", size = 2) +
annotate("text",x=62.5,y=11.5,label="Trachydacite", size = 2) +
annotate("text",x=56.5,y=14.5,label="Phonolite", size = 2) +
theme_minimal() +
xlim(0,78) +
ylim(-1,20) +
labs(title = "Total Alkali vs. Silica Plot for LIBS \nWith Earth Reference Data Removed",
x = "Si",
y = "Na + K",
color="Cluster")
```
While we see Clusters 1, 3 and 4 fall into different places in the igneous rock classification diagram, Cluster 2 is separated from the other three clusters in this diagram. Cluster 4 looks like it is mostly basalt and picrobasalt, while Clusters 1 and 3 fall into many different categories in this diargam. Cluster 1 is also interesting, as it seems really scattered, and some points in Cluster 1 do not even fall into the igneous rock classiciation diagram. I think this is because Cluster 1 contains some points which are outliers in the Ternary plot, such as the points in the bottom right corner of the Ternary plot.
## 4.5 Conclusions, Limitations, and Future Work.
Once again we see that Cluster 2 is different than the other clusters. When speaking with Dr. Rogers, she pointed out that a Total Alkali Silica plot with an igneous rock classification diagram will only make sense if all the sample are igneous rocks. We see that while we cannot confirm the rock types of all the clusters, we know that Cluster 2 is definitely not included with the other igneous rocks, which is supported by the fact that it is associated with the sedimentary PIXl sample from the Ternary plot. I think further analysis on Cluster 2 could tell us more about it.
# 5.0 Finding 3: Earth Scaled LIBS
With the LIBS samples, we were also given the Earth reference data in terms of quartiles for each element. I am going to scale the LIBS samples by the quartile information to see how Mars LIBS data would compare to Earth LIBS data, to see what exactly is different between them.
## 5.1 Data, Code, and Resources
1. lahira-finalProjectF24.Rmd (with knit pdf and html) is this notebook.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/lahira-finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08-FinalProjectNotebook/lahira-finalProjectF24.Rmd)
2. v1_libs is the rds containing the LIBS data, with all Earth reference samples removed.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds).
3. v1_pixl.Rds is the rds containing the PIXL data. It only contains the elemental compositions for each element.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds).
4. LIBS_training_set_quartiles.Rds is the rds containing the LIBS earth reference data.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/LIBS_training_set_quartiles.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/LIBS_training_set_quartiles.Rds).
We are comparing the difference between the normal LIBS data and the scaled libs data. We scale the libs data by starting with the Earth Reference data, which gives median and quartile data for elemental composition on Earth. By scaling the Mars LIBS data by the Earth element composition quartile data, we will see how the Mars LIBS data compares to rock samples on Earth.
While we are going to Earth Scale the LIBS data, we are also going to Earth Scale the PIXL data just to compare the two and see what we can learn. We will also plot the original data for reference.
We continue to remove the Earth reference samples from the LIBS dataset.
```{r}
# Load the saved LIBS data with metadata added (libs without earth)
libs.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs.Rds")
libs_earth <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/LIBS_training_set_quartiles.Rds")
libs_trim <- libs.df %>% select(c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) %>%
rowwise() %>% mutate(Si= (SiO2-libs_earth[3,2])/(libs_earth[4,2] - libs_earth[2,2]),
Ti= (TiO2-libs_earth[3,3])/(libs_earth[4,3] - libs_earth[2,3]),
Al= (Al2O3-libs_earth[3,4])/(libs_earth[4,4] - libs_earth[2,4]),
Fe= (FeOT-libs_earth[3,5])/(libs_earth[4,5] - libs_earth[2,5]),
Mg= (MgO-libs_earth[3,6])/(libs_earth[4,6] - libs_earth[2,6]),
Ca= (CaO-libs_earth[3,7])/(libs_earth[4,7] - libs_earth[2,7]),
Na= (Na2O-libs_earth[3,8])/(libs_earth[4,8] - libs_earth[2,8]),
K= (K2O-libs_earth[3,9])/(libs_earth[4,9] - libs_earth[2,9])) %>%
select(!c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O))
# PIXL data added
pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds")
pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)],
as.factor)
pixl.df <- pixl.df[2:16,] #Excluding first, atmospheric sample
new_pixl_trim <- pixl.df
new_pixl_trim_we <- pixl.df %>% select(c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O)) %>%
rowwise() %>% mutate(Si= (SiO2-libs_earth[3,2])/(libs_earth[4,2] - libs_earth[2,2]),
Ti= (TiO2-libs_earth[3,3])/(libs_earth[4,3] - libs_earth[2,3]),
Al= (Al2O3-libs_earth[3,4])/(libs_earth[4,4] - libs_earth[2,4]),
Fe= (FeOT-libs_earth[3,5])/(libs_earth[4,5] - libs_earth[2,5]),
Mg= (MgO-libs_earth[3,6])/(libs_earth[4,6] - libs_earth[2,6]),
Ca= (CaO-libs_earth[3,7])/(libs_earth[4,7] - libs_earth[2,7]),
Na= (Na2O-libs_earth[3,8])/(libs_earth[4,8] - libs_earth[2,8]),
K= (K2O-libs_earth[3,9])/(libs_earth[4,9] - libs_earth[2,9])) %>%
select(!c(SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, K2O))
```
## 5.2 Contribution
I wrote all the code for this section.
## 5.3 Methods Description
We will start basic with the analysis, then move to more complex methods. We start with a boxplot just to examine what features are different compared to the Earth reference samples. We also create a density map, just to add to the analysis of the boxplots. We then make a heatmap of the scaled data, and run pca and graph the biplot of the pca.
## 5.4 Result and Discussion
We start with a boxplot of the unscaled LIBS and PIXL data. We graph them side by side for comparison.
```{r}
ggplot() +
# Boxplots for Si
geom_boxplot(aes(
x = factor(paste("Si", c(rep("LIBS", length(libs.df$SiO2)), rep("PIXL", length(new_pixl_trim$SiO2)))),
levels = c("Si LIBS", "Si PIXL")),
y = c(libs.df$SiO2, new_pixl_trim$SiO2),
fill = c(rep("LIBS", length(libs.df$SiO2)), rep("PIXL", length(new_pixl_trim$SiO2)))
)) +
# Boxplots for Fe
geom_boxplot(aes(
x = factor(paste("Fe", c(rep("LIBS", length(libs.df$FeOT)), rep("PIXL", length(new_pixl_trim$FeOT)))),
levels = c("Fe LIBS", "Fe PIXL")),
y = c(libs.df$FeOT, new_pixl_trim$FeOT),
fill = c(rep("LIBS", length(libs.df$FeOT)), rep("PIXL", length(new_pixl_trim$FeOT)))
)) +
# Boxplots for Mg
geom_boxplot(aes(
x = factor(paste("Mg", c(rep("LIBS", length(libs.df$MgO)), rep("PIXL", length(new_pixl_trim$MgO)))),
levels = c("Mg LIBS", "Mg PIXL")),
y = c(libs.df$MgO, new_pixl_trim$MgO),
fill = c(rep("LIBS", length(libs.df$MgO)), rep("PIXL", length(new_pixl_trim$MgO)))
)) +
# Boxplots for Al
geom_boxplot(aes(
x = factor(paste("Al", c(rep("LIBS", length(libs.df$Al2O3)), rep("PIXL", length(new_pixl_trim$Al2O3)))),
levels = c("Al LIBS", "Al PIXL")),
y = c(libs.df$Al2O3, new_pixl_trim$Al2O3),
fill = c(rep("LIBS", length(libs.df$Al2O3)), rep("PIXL", length(new_pixl_trim$Al2O3)))
)) +
# Boxplots for Ca
geom_boxplot(aes(
x = factor(paste("Ca", c(rep("LIBS", length(libs.df$CaO)), rep("PIXL", length(new_pixl_trim$CaO)))),
levels = c("Ca LIBS", "Ca PIXL")),
y = c(libs.df$CaO, new_pixl_trim$CaO),
fill = c(rep("LIBS", length(libs.df$CaO)), rep("PIXL", length(new_pixl_trim$CaO)))
)) +
# Boxplots for Na
geom_boxplot(aes(
x = factor(paste("Na", c(rep("LIBS", length(libs.df$Na2O)), rep("PIXL", length(new_pixl_trim$Na2O)))),
levels = c("Na LIBS", "Na PIXL")),
y = c(libs.df$Na2O, new_pixl_trim$Na2O),
fill = c(rep("LIBS", length(libs.df$Na2O)), rep("PIXL", length(new_pixl_trim$Na2O)))
)) +
# Boxplots for K
geom_boxplot(aes(
x = factor(paste("K", c(rep("LIBS", length(libs.df$K2O)), rep("PIXL", length(new_pixl_trim$K2O)))),
levels = c("K LIBS", "K PIXL")),
y = c(libs.df$K2O, new_pixl_trim$K2O),
fill = c(rep("LIBS", length(libs.df$K2O)), rep("PIXL", length(new_pixl_trim$K2O)))
)) +
# Boxplots for Ti
geom_boxplot(aes(
x = factor(paste("Ti", c(rep("LIBS", length(libs.df$TiO2)), rep("PIXL", length(new_pixl_trim$TiO2)))),
levels = c("Ti LIBS", "Ti PIXL")),
y = c(libs.df$TiO2, new_pixl_trim$TiO2),
fill = c(rep("LIBS", length(libs.df$TiO2)), rep("PIXL", length(new_pixl_trim$TiO2)))
)) +
# Labels and theme
labs(
title = "Comparison of Unscaled LIBS and PIXL Data",
x = "Element (Source)",
y = "wt. (%)",
fill = "Source"
) +
scale_fill_manual(values = c("LIBS" = "lightblue", "PIXL" = "lightcoral")) +
scale_x_discrete(labels = c(
"Si LIBS" = " Si", "Si PIXL" = "",
"Ti LIBS" = " Ti", "Ti PIXL" = "",
"Al LIBS" = " Al", "Al PIXL" = "",
"Fe LIBS" = " Fe", "Fe PIXL" = "",
"Mg LIBS" = " Mg", "Mg PIXL" = "",
"Ca LIBS" = " Ca", "Ca PIXL" = "",
"Na LIBS" = " Na", "Na PIXL" = "",
"K LIBS" = " K", "K PIXL" = ""
)) +
theme_minimal()
```
Now we plot the scaled LIBS and PIXL data.
```{r}
ggplot() +
# Boxplots for Mg
geom_boxplot(aes(
x = factor(paste("Mg", c(rep("LIBS", length(libs_trim$Mg$MgO)), rep("PIXL", length(new_pixl_trim_we$Mg$MgO)))),
levels = c("Mg LIBS", "Mg PIXL")),
y = c(libs_trim$Mg$MgO, new_pixl_trim_we$Mg$MgO),
fill = c(rep("LIBS", length(libs_trim$Mg$MgO)), rep("PIXL", length(new_pixl_trim_we$Mg$MgO)))
)) +
# Boxplots for Fe
geom_boxplot(aes(
x = factor(paste("Fe", c(rep("LIBS", length(libs_trim$Fe$FeOT)), rep("PIXL", length(new_pixl_trim_we$Fe$FeOT)))),
levels = c("Fe LIBS", "Fe PIXL")),
y = c(libs_trim$Fe$FeOT, new_pixl_trim_we$Fe$FeOT),
fill = c(rep("LIBS", length(libs_trim$Fe$FeOT)), rep("PIXL", length(new_pixl_trim_we$Fe$FeOT)))
)) +
# Boxplots for Ca
geom_boxplot(aes(
x = factor(paste("Ca", c(rep("LIBS", length(libs_trim$Ca$CaO)), rep("PIXL", length(new_pixl_trim_we$Ca$CaO)))),
levels = c("Ca LIBS", "Ca PIXL")),
y = c(libs_trim$Ca$CaO, new_pixl_trim_we$Ca$CaO),
fill = c(rep("LIBS", length(libs_trim$Ca$CaO)), rep("PIXL", length(new_pixl_trim_we$Ca$CaO)))
)) +
# Boxplots for Na
geom_boxplot(aes(
x = factor(paste("Na", c(rep("LIBS", length(libs_trim$Na$Na2O)), rep("PIXL", length(new_pixl_trim_we$Na$Na2O)))),
levels = c("Na LIBS", "Na PIXL")),
y = c(libs_trim$Na$Na2O, new_pixl_trim_we$Na$Na2O),
fill = c(rep("LIBS", length(libs_trim$Na$Na2O)), rep("PIXL", length(new_pixl_trim_we$Na$Na2O)))
)) +
# Boxplots for Ti
geom_boxplot(aes(
x = factor(paste("Ti", c(rep("LIBS", length(libs_trim$Ti$TiO2)), rep("PIXL", length(new_pixl_trim_we$Ti$TiO2)))),
levels = c("Ti LIBS", "Ti PIXL")),
y = c(libs_trim$Ti$TiO2, new_pixl_trim_we$Ti$TiO2),
fill = c(rep("LIBS", length(libs_trim$Ti$TiO2)), rep("PIXL", length(new_pixl_trim_we$Ti$TiO2)))
)) +
# Boxplots for K
geom_boxplot(aes(
x = factor(paste("K", c(rep("LIBS", length(libs_trim$K$K2O)), rep("PIXL", length(new_pixl_trim_we$K$K2O)))),
levels = c("K LIBS", "K PIXL")),
y = c(libs_trim$K$K2O, new_pixl_trim_we$K$K2O),
fill = c(rep("LIBS", length(libs_trim$K$K2O)), rep("PIXL", length(new_pixl_trim_we$K$K2O)))
)) +
# Boxplots for Si
geom_boxplot(aes(
x = factor(paste("Si", c(rep("LIBS", length(libs_trim$Si$SiO2)), rep("PIXL", length(new_pixl_trim_we$Si$SiO2)))),
levels = c("Si LIBS", "Si PIXL")),
y = c(libs_trim$Si$SiO2, new_pixl_trim_we$Si$SiO2),
fill = c(rep("LIBS", length(libs_trim$Si$SiO2)), rep("PIXL", length(new_pixl_trim_we$Si$SiO2)))
)) +
# Boxplots for Al
geom_boxplot(aes(
x = factor(paste("Al", c(rep("LIBS", length(libs_trim$Al$Al2O3)), rep("PIXL", length(new_pixl_trim_we$Al$Al2O3)))),
levels = c("Al LIBS", "Al PIXL")),
y = c(libs_trim$Al$Al2O3, new_pixl_trim_we$Al$Al2O3),
fill = c(rep("LIBS", length(libs_trim$Al$Al2O3)), rep("PIXL", length(new_pixl_trim_we$Al$Al2O3)))
)) +
# Labels and theme
labs(
title = "Comparison of Earth Scaled LIBS and Earth Scaled PIXL Data",
caption = "LIBS and PIXL samples scaled by the Earth reference data. \nThe dashed lines represent the median and the 3rd and 1st.",
x = "Element (Source)",
y = "Scaled Value",
fill = "Source"
) +
scale_fill_manual(values = c("LIBS" = "lightblue", "PIXL" = "lightcoral")) +
scale_x_discrete(labels = c(
"Si LIBS" = " Si", "Si PIXL" = "",
"Ti LIBS" = " Ti", "Ti PIXL" = "",
"Al LIBS" = " Al", "Al PIXL" = "",
"Fe LIBS" = " Fe", "Fe PIXL" = "",
"Mg LIBS" = " Mg", "Mg PIXL" = "",
"Ca LIBS" = " Ca", "Ca PIXL" = "",
"Na LIBS" = " Na", "Na PIXL" = "",
"K LIBS" = " K", "K PIXL" = ""
)) +
theme_minimal() +
geom_hline(yintercept = c(-1, 0, 1), linetype = "dashed", color = "red") +
theme(plot.caption = element_text(hjust = 0))
```
We immediately notice that Iron and Magneisum have the greatest scaled range, which means they may be indicators for this dataset. We continue with a density plot to see if this observation is repeated.
```{r}
ggplot() +
geom_density(aes(x = libs_trim$Si$SiO2, color = "Si"), fill = "#1f78b4", alpha = 0.3) + # Blue
geom_density(aes(x = libs_trim$Ti$TiO2, color = "Ti"), fill = "#e31a1c", alpha = 0.3) + # Red
geom_density(aes(x = libs_trim$Al$Al2O3, color = "Al"), fill = "#33a02c", alpha = 0.3) + # Green
geom_density(aes(x = libs_trim$Fe$FeOT, color = "Fe"), fill = "#ff7f00", alpha = 0.3) + # Orange
geom_density(aes(x = libs_trim$Mg$MgO, color = "Mg"), fill = "#6a3d9a", alpha = 0.3) + # Purple
geom_density(aes(x = libs_trim$Ca$CaO, color = "Ca"), fill = "#b15928", alpha = 0.3) + # Brown
geom_density(aes(x = libs_trim$Na$Na2O, color = "Na"), fill = "#a6cee3", alpha = 0.3) + # Light Blue
geom_density(aes(x = libs_trim$K$K2O, color = "K"), fill = "#fb9a99", alpha = 0.3) + # Pink
labs(title = "Earth Scaled LIBS by Elemental Composition", x = "Value", y = "Density") +
scale_color_manual(name = "Elements",
values = c("Si" = "#1f78b4", "Ti" = "#e31a1c", "Al" = "#33a02c",
"Fe" = "#ff7f00", "Mg" = "#6a3d9a", "Ca" = "#b15928",
"Na" = "#a6cee3", "K" = "#fb9a99")) +
theme_minimal()
```
Once again from the density plot we see that Magnesium and Iron have the greatest range of all the elements. We also notice that Potassium has an extremely high density, and thus could potentially be removed from the data set.
```{r}
set.seed(1234)
km <- kmeans(libs_trim,4)
cluster.df<-data.frame(cluster= 1:4, size=km$size)
kable(cluster.df,caption="Samples per cluster")
pheatmap(km$centers,scale="none", main="Clusters by Element Composition on Earth Scaled LIBS")
```
We see that there is a good distribution of samples in each cluster, which tells us no cluster is under/over represented. From the heatmap, we see once again that Iron and Magnesium are the biggest indicators of each cluster.
```{r}
pca_libs <- prcomp(libs_trim, scale=FALSE)
ggbiplot::ggbiplot(pca_libs,
groups = as.factor(km$cluster)) +
labs(title="Biplot of PCA on Earth Scaled LIBS")
```
The biplot just confirms what we saw in the heatmap, that some clusters are associated with Magnesium, some with Iron, and some both. We also see that Cluster 4 is different than the other clusters.
## 5.5 Conclusions, Limitations, and Future Work.
Iron and magneisum are the biggest indicators of this dataset, as shown from the analysis. I would reccomend that future analysis of LIBS focus on those two elements, or at the very least include those two elements. I would also like to do some research on the significance of the presence of iron and magnesium in rock samples.
# Bibliography
* https://www.mdpi.com/1996-1944/16/20/6641 [Królicka23]
* https://science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/ [NASA21]
* https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2023EA002829 [Veneranda23]
* https://www.mdpi.com/2072-4292/13/23/4773 [Liu21]
* ggtern for the ternary diagram
# Appendix
I wanted to investigate why the elemental compositions of each sample did not add up to 100, meaning that 100% of the rock has been identified. For some samples, the sum of the elemental compositions can be in the low 80s or in the 120s. This variance seems to throw the accuracy of all the measurements into question, so I wanted to see why exactly this occurs.
In the paper Application of Laser-Induced Breakdown Spectroscopy for Depth Profiling of Multilayer and Graded Materials (https://www.mdpi.com/1996-1944/16/20/6641), in section 5 the authors discuss how LIBS data can be hard to analyze in situations where traditional calibration techniques cannot be used, such as remote sensing in space missions, which is the exact situation the data is collected from. In this situation, a calibration free method is used to determine elemental composition, which necessitates several specific criteria. The problem is that with matrix effects, laser parameters, and experimental configurations, it is hard to be completely accurate with analyzing concentrations.
Some of the difficulty also comes from the fact that the LIBS system itself is not the same every time. From Post-landing Major Element Quantification Using SuperCam Laser Induced Breakdown Spectroscopy (https://www.sciencedirect.com/science/article/pii/S0584854721003049), it is revealed that the LIBS laser is not exactly the same every time it is fired. There can be changes with the size of the ablation, which is essentially the mini crater left behind by the laser. The temperature of the plasma in the laser can vary as well, varying from fire to fire of the laser. These changes can affect the calibration free model for LIBS elemental analysis, which can cuase the issues with total elemental composition as we see.