peterc_finalProjectF24.Rmd

---
title: "Data Analytics Research Individual Final Project Report - Mars"
author: "Charlotte Peterson"
date: "Fall 2024"
output:
  pdf_document:
    toc: yes
    toc_depth: '3'
  html_notebook: default
  html_document:
    toc: yes
    toc_depth: 3
    toc_float: yes
    number_sections: yes
    theme: united
---


# DAR Project and Group Members

* Project name: Mars
* GitHub ID: dar-peterc
* Project team members: Dante Mwatibo, Doña Roberts, David Walcyzk, Xuanting Wang, Ashton Compton, Margo VanEsselstyn, Nicolas Morawski, CJ Marino, Aadi Lahiri

# 0.0 Preliminaries.

```{r}
# Set to TRUE to expand R code blocks; set to FALSE to collapse R code blocks
show <- TRUE
```

Executing this R notebook requires some subset of the following packages:

* `ggplot2`
* `tidyverse`
* `pandoc`
* `rmarkdown`
* `stringr`
* `ggbiplot`
* `knitr`
* `rpart`
* `rpart.plot`
* `caret`
* `ggrepel`
* `ggtern`
* `geosphere`


These will be installed and loaded as necessary (code suppressed).

<!-- The `include=FALSE` option prevents your code from being shown at all -->
```{r, include=FALSE}
# This code will install required packages if they are not already installed
# ALWAYS INSTALL YOUR PACKAGES LIKE THIS!
if (!require("ggplot2")) {
   install.packages("ggplot2")
   library(ggplot2)
}
if (!require("tidyverse")) {
   install.packages("tidyverse")
   library(tidyverse)
}

if (!require("pandoc")) {
  install.packages("pandoc")
  library(pandoc)
}

# Required packages for M20 LIBS analysis
if (!require("rmarkdown")) {
  install.packages("rmarkdown")
  library(rmarkdown)
}

if (!require("stringr")) {
  install.packages("stringr")
  library(stringr)
}

if (!require("ggbiplot")) {
  install.packages("ggbiplot")
  library(ggbiplot)
}

if (!require("knitr")) {
  install.packages("knitr")
  library(knitr)
}

if (!require("rpart")) {
  install.packages("rpart")
  library(rpart)
}

if (!require("rpart.plot")) {
  install.packages("rpart.plot")
  library(rpart)
}

if (!require("caret")) {
  install.packages("caret")
  library(caret)
}

if (!require("ggrepel")) {
  install.packages("ggrepel")
  library(ggrepel)
}

if (!require("geosphere")) {
  install.packages("geosphere")
  library(ggrepel)
}

if (!require("ggtern")) {
  install.packages("ggtern")
  library(ggrepel)
}

if (!require("geosphere")) {
  install.packages("geosphere")
  library(geosphere)
}
```

# 1.0 Project Introduction

The Mars Project is focused on data from the 2020 Mars Perseverance Rover. The goal of the mission is to look for microbial ancient life or forms of water on Mars (things that could suggest life). Perseverance uses multiple instruments, including PIXL (Planetary Instrument for X-Ray Lithochemistry), SHERLOC (Scanning Habitable Environments with Raman and Luminescence for Organics and Chemicals) and SUPERCAM. SUPERCAM has multiple instruments that measure spectroscopy to measure properties of materials on Mars, including LIBS (Laser-induced breakdown spectroscopy). This notebook will primarily focus on the data we have been given of PIXL and LIBS.

# 2.0 Organization of Report

This report is organize as follows:

* Section 3.0.  Finding 1: LIBS and PIXL Matching - We were able to combine the LIBS and PIXL data sets by picking a maximum distance variable from a PIXL abrasion and matching LIBS samples that were within the set distance of a PIXL abrasion.

* Section 4.0: Finding 2: Soil Composition Analysis - Using the LIBS and PIXL combined data set, I created a plot of the composition percentages of chemical compounds such as Si02, K20, etc. using log scaling to compare the compositions of a PIXL abrasion and the corresponding LIBS sample compositions (based on the LIBS samples for x distance away from a PIXL abrasion).

* Section 5.0 Finding 3: Analyzing Cation Combinations using LIBS and PIXL matched data: Using the LIBS and PIXL combined data set, we created a ternary plot to show the distribution of LIBS samples sorted by what PIXL abrasion they are closest to (based on a chosen distance variable).

* Section 6.0 Overall conclusions and suggestions

* Section 7.0 Appendix This section describe the following additional works that may be helpful in the future work: Additional Soil Composition Plots of LIBS and PIXL


# 3.0 Finding 1: PIXL and LIBS Matching

Firstly, we will be taking a look at how PIXL and LIBS correspond. Our group found very early in our research that there wasn't a feature among them that can be used to match the data sets. For example, the columns of PIXL are organized by latitude and longitude as well as sample number (1-16), sample name, and abrasion name. Unfortunately, LIBS wasn't sorted the same way. LIBS was organized by the sol that the sample was taken at. LIBS is broken up into many different types of samples as well, including the fact it carries around earth reference data to be used in comparing with different sample sites. That being said, in order to match PIXL targets to corresponding LIBS samples, Margo and I created a new data set that added another metadata feature to PIXL (latitude and longitude coordinates) which we obtained from the Analyst's Notebook. Once this was added in, we realized that the longitude and latitude didn't really match. So Margo created a distance function to match LIBS samples to PIXL targets based on whatever distance a person specifies. Originally, we set it to be rounded to three thousandths and match based on that.

This helped answer the question of how can we correlate the LIBS and PIXL data sets to be able to plot them on the same axis of whatever plot is trying to be created. I was curious to see how close PIXL targets were to LIBS sample sites as well as how many LIBS samples would be associated with a PIXL target perhaps with a radius of 7 or 10 meters.

## 3.1 Data, Code, and Resources

1. peterc_finalProjectF24.Rmd (with knit pdf and html) is this notebook.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentNotebooks/Assignment08_FinalProjectNotebook/peterc_finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentNotebooks/Assignment07_DraftFinalProjectNotebook/peterc_finalProjectF24_roughdraft.Rmd)


2. v1_libs_to_sample.Rds is the combined data set of PIXL and LIBS that includes the distance from a PIXL abrasion to a LIBS sample.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs_to_sample.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs_to_sample.Rds).

Firstly, we set the number of meters distance threshold between a PIXL abrasion and LIBS sample. Within the v1_libs_to_sample.Rds, which Margo and I collaborated on there is a distance variable that is set via a function that Margo created to measure the distance between a PIXL abrasion and LIBS sample using their latitude and longitude coordinates. 7 meters is the best metric because that is the maximum distance the LIBS instrument can accurately collect data from.
```{r}
meters <- 7
```

To prepare the data, we will load in the v1_libs_to_sample.Rds, group by latitude and longitude of LIBS, and filter out every LIBS sample that has a larger distance from its corresponding PIXL abrasion than specified in the chosen distance (meter) value. In order to make a scatter plot of the LIBS and PIXL points, we will create a new data frame of each unique PIXL abrasion and its coordinates. That is the unique_pixl data frame which will be used to plot the PIXL abrasion coordinates.

```{r }
libs_to_sample <- readRDS("~/DAR-Mars-F24/StudentData/v1_libs_to_sample.Rds")
#make a filtered data frame that picks the max point out of all libs samples at a certain target
# for simplicity
df_filtered <- libs_to_sample %>%
  group_by(Lat.libs, Lon.libs) %>%
  filter(Point.libs == max(Point.libs)) %>%
  ungroup()
df_distance_filter <- df_filtered[df_filtered$Distance <= meters,]

#make a data frame with the unique pixl coordinates since they are in pairs of identical lat/lon
unique_pixl <- df_filtered %>%
  select(Lat.pixl, Lon.pixl, Abrasion.pixl) %>% distinct()
```


## 3.2 Contribution

The logistics of filtering the original data set is my work. Previously, I had to do a lot more filtering in order to choose the distance and get unique LIBS points in order to not put too many points on the scatterplot. Margo and I worked together to create the data set that I use in this section (v1_libs_to_sample.Rds) by deciding how to match up LIBS to certain PIXL abrasions. Margo created the distance function to find the distance between PIXL abrasions and LIBS samples and added that column to the data set. Then, Dona fixed all the naming conventions in the data set in order to have consistency and make it easy to tell which variable was originally from each data set (ex. Name.pixl, Target.libs). I then used the data to create plots and analyze. Below is a scatterplot showcasing the distribution of PIXL abrasions and corresponding LIBS samples based on the specified max distance between them.


## 3.3 Methods Description

I chose to use ggplot to display the LIBS and PIXL data for easier analysis of seeing how many LIBS samples align with different PIXL abrasions. It was very interesting to change around the max distance and see which aligned with which abrasion. In terms of execution, it took me a bit of time to organize all of the thoughts Margo and I had on how to create and manage this data set. Originally, we had rounded the distances to the nearest thousandth to match them, and then were plotting that way. However, that left a lot of room for error and wasn't as accurate. Creating a distance function allows for the scientist or person using the Mars Mission Minder App to choose whatever distance they would like and allows for much more functionality. Modifying the data set more ended up being more efficient than adding small edits as I was making my plots which was originally making me crazy (as in changing variable types if they weren't what they were supposed to be). In the end, I learned a lot about data organization and that consistency and staying organized is key and saves a lot of time later on.


## 3.4 Result and Discussion

To create a plot of the LIBS and PIXL data organized by what LIBS samples align with what abrasions, first plotted the LIBS samples colored by what PIXL abrasion they were closest to, and then plotted th PIXL abrasions as red stars on the plot to show where the PIXL abrasions were relative to the LIBS samples.

```{r }
#plot of libs and pixl data by lat/lon
ggplot(data = df_distance_filter) +
  geom_point(mapping = aes(x = Lon.libs, y = Lat.libs, color = Abrasion.pixl)) +                 # Color by abrasion
  geom_point(mapping = aes(x = Lon.pixl, y = Lat.pixl), data = unique_pixl, color = "red", shape = 3, size = 3) + # Fixed color for unique_pixl points
  geom_text_repel(mapping = aes(x = Lon.pixl, y = Lat.pixl, label = Abrasion.pixl), data = unique_pixl,
                  vjust = 2, color = "red") +
  labs(title = paste("LIBS Samples and PIXL Abrasions within", meters, "meters"),
       x = "Longitude",
       y = "Latitude",
       color = "PIXL Abrasion",
       caption = "Data collected using LIBS and PIXL instruments on Perserverance rover.\n Shows PIXL abrasions plotted as red stars,\n and the corresponding LIBS samples colored by their closest PIXL abrasion.")+          # Label for the color legend
 # Center the caption on the left side
  theme(
    plot.caption = element_text(hjust = 0)  # Aligns caption to the left
  )
```
When looking at this plot, the data frame of the filtered LIBS data (df_distance_filter) based on distances to a PIXL abrasion less than or equal to 7 meters finds 3 LIBS samples corresponding to Alfalfa, 2 corresponding to Bellegrade, 3 corresponding to Dourbes, 7 corresponding to Novarupta, 4 corresponding to Quartier, and 9 corresponding to Thornton Gap.

## 3.5 Conclusions, Limitations, and Future Work.

I believe my findings make it very easy for researchers and scientists to have a visualization of PIXL and LIBS samples that they want to see based on what max distance they are focusing on when examining PIXL and LIBS together. For future work, I think as more coordinates and data is added to the LIBS and PIXL data sets as they become available from NASA this will continue to be built upon and although it isn't super complicated of a plot, it provides a very necessary context to visualize PIXL and LIBS. I didn't find many limitations in this plot as it basically is only providing an easy visual of the PIXL and LIBS data together.

# 4.0 Finding 2: Soil Composition Analysis

Using the LIBS and PIXL combined data set, I created a plot of the composition percentages of chemical compounds such as Si02, K20, etc. using log scaling to compare the compositions of a PIXL abrasion and the corresponding LIBS sample compositions (based on the LIBS samples for x distance away from a PIXL abrasion). The question I was trying to answer was how does the LIBS data of a certain area compare to the PIXL data of that area? Are there many differences between locations (meaning which abrasions and their corresponding LIBS targets are different and in what ways, do igneous vs. sedimentary show a pattern?). In order to accomplish this, we will use a distance function to filter within 7 meters of distance between a PIXL sample and LIBS target (this is the maximum distance of accuracy based on NASA's information on LIBS spectroscopy machinery). Then, the data for earth quartile references (first and third quartiles and median), PIXL abrasion chemical composition, and corresponding LIBS target chemical compositions will be plotted based on the chosen PIXL abrasion (only one abrasion is plotted at a time).

## 4.1 Data, Code, and Resources

1. peterc_finalProjectF24.Rmd (with knit pdf and html) is this notebook.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentNotebooks/Assignment08_FinalProjectNotebook/peterc_finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentNotebooks/Assignment07_DraftFinalProjectNotebook/peterc_finalProjectF24_roughdraft.Rmd)

2. peterc_assignment5.Rmd (with knit pdf and html) which is my previous notebook.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment05/peterc_assignment05.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment05/peterc_assignment05.Rmd])

3. supercam_libs_moc_loc.Rds which is the original LIBS data given to our research group.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/supercam_libs_moc_loc.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/supercam_libs_moc_loc.Rds)

4. pixl_sol_coordinates.Rds, which is the data set containing the PIXL data, sol, and coordinates.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/pixl_sol_coordinates.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/pixl_sol_coordinates.Rds)

4. LIBS_training_set_quartiles.Rds is the data with earth quartile reference data.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/LIBS_training_set_quartiles.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/LIBS_training_set_quartiles.Rds).

To prepare the data, I start by loading in the LIBS data. Then, we drop the standard deviation columns and sum of percentage columns leaving us with just the weighted composition in terms of numerical data. We also remove the scct values, as those values are the ones that are earth reference samples that Perseverance carries with it. Therefore, they will not be very relevant when plotting the LIBS data as we are focused on the Mars soil compositions.

```{r}
#Earth quartiles
earthquartiles.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/LIBS_training_set_quartiles.Rds")
#Load in LIBS data
libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
#Drop the standard deviation features, the sum of the percentages,
#the distance, and the total frequencies
libs.df <- libs.df %>%
  select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
# Convert the points to numeric
libs.df$point <- as.numeric(libs.df$point)
libs.df[,6:13] <- sapply(libs.df[,6:13],as.numeric)
#remove the scct/reference samples
libs.df<-libs.df%>%
  filter(!(grepl("scct", target)))
#add a column to indicate the nearest pixl
libs.df<-cbind(nearestpixl=0,libs.df)
#make a dataframe of just the LIBS Lat/Long and target name and remove duplicates
libstargets.df<-libs.df[,c(1,3,4,5)]
libstargets.df<-distinct(libstargets.df)
```

Set meters and chosen abrasion to act as a slider in the 2d app.
```{r}
#Choose max distance variable between PIXL and LIBS data
meters = 7
#Choose PIXL abrasion you want to look at
abrasion_name = "ThorntonGap"
```

Next, we load in the PIXL data. We remove the atmospheric sample and only select one PIXL sample of each abrasion.
```{r, data02}
#read in pixl data with lat/long
pixl.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")
#include only pixl metadata
pixl.df<-pixl.df %>%
  select(c(1,2,19,20,22))
#convert Lat/Long to numeric
pixl.df$Lat <- as.numeric(pixl.df$Lat)
pixl.df$Long <- as.numeric(pixl.df$Long)
#remove rows so we only have one sample per abrasion and remove atmospheric sample
pixl.df<-pixl.df[c(2,4,6,8,10,12,14,16),]
```

Next, we will initialize a distance variable (to indicate distance between PIXL abrasion and LIBS target) and also initialize each PIXL abrasion, which will be used to mark which PIXL abrasion the LIBS sample is closest to by using a factor of 0 or 1.
```{r}
libstargets.df<-cbind(libstargets.df,"Distance"=0,"Bellegrade"=0,"Dourbes"=0,"Quartier"=0,"Alfalfa"=0,"ThorntonGap"=0,"Berry Hollow"=0,"Novarupta"=0,"Uganik Island"=0)
```

The distance function below will calculate the difference between LIBS target and all the PIXL abrasions, and pick the smallest distance to pick the closest PIXL abrasion to that LIBS target.
```{r}
for(i in 1:nrow(libstargets.df)) {
    libstargets.df[i,c(6:13)]<-c(distHaversine(pixl.df[1,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[2,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[3,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[4,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[5,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[6,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[7,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[8,c(1,2)],libstargets.df[i,c(2,3)],r=3393169))

    libstargets.df[i,1]<-which.min(libstargets.df[i,c(6:13)])
    libstargets.df[i,5]<-min(libstargets.df[i,c(6:13)])
}
libstargets.df$nearestpixl<-as.factor(libstargets.df$nearestpixl)
levels(libstargets.df$nearestpixl)<-(c("Bellegrade","Dourbes","Quartier","Alfalfa","ThorntonGap","Berry Hollow","Novarupta","Uganik Island"))
```

Below is another initializer for the PIXL abrasion data. This sets the variables for each PIXL abrasion.
```{r}
Bellegrade<-libstargets.df[libstargets.df$nearestpixl=="Bellegrade",]$target
Dourbes<-libstargets.df[libstargets.df$nearestpixl=="Dourbes",]$target
Quartier<-libstargets.df[libstargets.df$nearestpixl=="Quartier",]$target
Alfalfa<-libstargets.df[libstargets.df$nearestpixl=="Alfalfa",]$target
ThorntonGap<-libstargets.df[libstargets.df$nearestpixl=="ThorntonGap",]$target
BerryHollow<-libstargets.df[libstargets.df$nearestpixl=="Berry Hollow",]$target
Novarupta<-libstargets.df[libstargets.df$nearestpixl=="Novarupta",]$target
UganikIsland<-libstargets.df[libstargets.df$nearestpixl=="Uganik Island",]$target
```

Next, we filter out the LIBS targets that are not within the specified distance variable. Then, we merge the LIBS data with the respective PIXL abrasion by mutating and adding an abrasion column that has the abrasion name closest to each LIBS target. We also add a column, LIBS or PIXL, which denotes if the row of data is from the PIXL and LIBS data sets.
```{r}
included.libs<-(libstargets.df%>%
  filter(Distance<meters))$target
libs.matrix <-libs.df %>%
  filter(target %in% included.libs)
libs.matrix <- libs.matrix[,c(5,7:14)]
libs.matrix<-libs.matrix[,c(1:2,4:9,3)]
libs.matrix<-cbind("Abrasion"=0,libs.matrix)
libs.matrix<-libs.matrix%>%
  mutate(Abrasion = ifelse(target%in%Alfalfa,"Alfalfa",
                    ifelse(target %in% Bellegrade, "Bellegrade",
                    ifelse(target %in% BerryHollow, "Berry Hollow",
                    ifelse(target %in% Dourbes, "Dourbes",
                    ifelse(target %in% Novarupta, "Novarupta",
                    ifelse(target %in% Quartier, "Quartier",
                    ifelse(target %in% ThorntonGap, "ThorntonGap",
                    ifelse(target %in% UganikIsland, "Uganik Island",Abrasion)))))))))
libs.matrix<-cbind(libsorpixl=1,libs.matrix)
```

Next, we will read in the PIXL data. We will remove the atmospheric sample (first sample) and only choose one of each PIXL sample in as each abrasion has two samples (only one will be necessary for the plot).

```{r, data03}
#read in pixl data with lat/long
pixl.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")
pixl.df<-pixl.df %>%
  select(c(5:8,12:14,17,19,18,22))
#reorder pixl columns so that it matches libs data organization
pixl.df<-pixl.df[,c(11,10,4,3,8,2,6,1,5,7)]
#remove atmospheric sample
pixl.df<-pixl.df[2:16,]
pixl.df<-cbind(libsorpixl=0,pixl.df)
```

Finally, we merge the LIBS and PIXL data sets we have modified thus far for a combined LIBS and PIXL data frame suitable for a soil composition line plot.
```{r}
colnames(pixl.df)<-colnames(libs.matrix)
pixllibs.df<-rbind(pixl.df,libs.matrix)
```

## 4.2 Contribution

Some of the data manipulating work was Margo's, such as the distance function. In terms of pivoting the data frame and the other steps of the preprocessing is my own work. The manipulating and setup of data below to plot the line soil composition plots is my own. Margo and I worked together to create the data sets used. In our presentation, David was able to create a facid grid to compliment my single abrasion analysis which showed the average LIBS target chemical composition and corresponding PIXL abrasion composition. This was done in RShiny and implemented in the app, so I felt that redoing the same plot wouldn't be necessary. I focused on plotting one specific abrasion to provide more in depth analysis of one abrasion and the surrounding area since David was able to provide more context and plot the PIXL abrasions all at once.

## 4.3 Methods Description

When deciding how to approach the concept of building soil composition plots of each PIXL abrasion and the corresponding LIBS targets within a certain distance maximum, I decided the best way was to start with the original data sets and modify them as needed. For the actual plot, the best way to format the data correctly is to pivot it, as I need the x axis to be the column names in the current data frame we have (SiO2 and other compositions) and the y axis to be the weighted composition values. We also need an indicator of if the data is from PIXL or LIBS, which also is helpful for building the line plots.

Users will have to set the distance variable in order to choose the maximum distance between PIXL abrasions and LIBS targets. This can vastly change the number of lines on the plots which can help prevent overcrowded plots. Users also can set a variable to choose a specific PIXL abrasion and corresponding LIBS targets, which is easier to interpret as plotting all of the LIBS and PIXL composition information on line plots leads to very condensed graphs that are hard to read.

## 4.4 Result and Discussion

First, we will turn the earth quartile information into a long data frame (meaning pivoting the columns into the values) and only select the first and third quartile rows.
```{r}
# Earth quartiles
filtered_rows <- earthquartiles.df %>%
  filter(`Training set Quartiles` %in% c("1st", "3rd", "Med"))
earthquartiles_long <- filtered_rows %>%
  pivot_longer(cols = starts_with("SiO2"):last_col(), names_to = "Compound", values_to = "Percentage")

earthquartiles_long <- earthquartiles_long %>% rename(Quartiles = `Training set Quartiles`)
```

Then, the data will be filtered to only include the data from a specific PIXL abrasion chosen by the user. The data is pivoted into a long format, and the columns are reordered to mimic similar plots from NASA papers.
```{r}
# Filter for the specific abrasion sample, e.g., "Alfalfa"
pixllibs_filtered <- pixllibs.df %>%
  filter(Abrasion == abrasion_name)

# Pivot the data to longer format for ggplot
pixllibs_long <- pixllibs_filtered %>%
  pivot_longer(cols = starts_with("SiO2"):last_col(), names_to = "Compound", values_to = "Percentage")

desired_order <- c("SiO2", "Al2O3", "FeOT", "MgO", "CaO", "Na2O", "K2O", "TiO2")  # Specify your custom order here
pixllibs_long$Compound <- factor(pixllibs_long$Compound, levels = desired_order)
```

For the plot, we use ggplot to plot the pixllibs_long data frame we created. The plot is colored by if the line is a PIXL abrasion's composition or a LIBS target's composition. We also add a layer with the earth quartile information, which is the dotted lines. The weight percentages on the y axis are log scaled as it makes the plot more readable, but this can be edited (will be added as a toggle in the app).
```{r}
# Map the PIXL/LIBS column to color and use target_name to differentiate lines
suppressWarnings(ggplot(pixllibs_long, aes(x = Compound, y = Percentage, color = as.factor(libsorpixl), group = target)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(trans='log10') +
  # Add Earth quartile lines using earthquartiles_long
  geom_line(data = earthquartiles_long, aes(x = Compound, y = Percentage, linetype = Quartiles, group = Quartiles),
            color = "black", linetype = "dotted") +
  labs(title = paste("Soil Composition for PIXL",abrasion_name,"and LIBS within", meters, "meters", sep = " "),
       x = "Chemical Compound",
       y = "Weight Percentage",
       color = "Measurement Type",
       linetype = "Quartiles",
       caption = "The chemical composition of a PIXL abrasion and the corresponding LIBS targets \n within specified distance of respective abrasion.") +
  scale_color_manual(values = c("0" = "blue", "1" = "red"), labels = c("PIXL", "LIBS")) +
  annotate("text", x = 5, y = .50, label = "1st Quartile", color = "black", hjust = 0) +
  annotate("text", x = 5, y = 2, label = "Median", color = "black", hjust = 0) +
  annotate("text", x = 5, y = 10, label = "3rd Quartile", color = "black", hjust = 0)+
  theme_minimal()+
     # Center the caption on the left side
  theme(
    plot.caption = element_text(hjust = 0)  # Aligns caption to the left
  ))

```
To save space in this notebook, I did not plot each abrasion (based on chosen abrasion variable name). In comparing each PIXL abrasion and the corresponding LIBS targets, I found that the PIXL abrasions corresponding to igneous had very similar plots, while the PIXL abrasions corresponding to sedimentary had very similar plots. The igneous/sedimentary indicator is shown in the PIXL data. An interesting thing to note is that for some abrasions, like Alfalfa and ThorntonGap, the chemical compositions of both PIXL samples in the abrasion are the same. For other abrasions, like Bellegrade, the chemical compositions of the PIXL samples in the Bellegrade abrasion differ. Also, many of the plots had a few points with very low K2O which seems like a major outlier.

## 4.5 Conclusions and Future Work
This finding can be used by geologists to analyze what different soil compositions around different PIXL abrasions can mean for life on Mars. For example, oxide presence doesn't necessarily indicate life, but it could indicate biological or chemical life processes. For example, CaO can indicate the presence of old biological material like shells or fossils. For future work, I would like to see why certain abrasions differ in their PIXL core sample composition. Is this based on the research done by scientists in selecting specific abrasions and trying to obtain differing samples for abrasions in certain areas? Or is this by chance? Or is it simply showing more rock variety or weathering the certain abrasions?

# 5.0 Finding 3: Analyzing Cation Combinations using LIBS and PIXL matched data
Using the LIBS and PIXL combined data set, we created a ternary plot to show the distribution of LIBS samples sorted by what PIXL abrasion they are closest to (based on a chosen distance variable). Much of the data preprocessing is similar to Finding 1 which we will repeat here. It will involve using a distance function to correlate LIBS targets with PIXL samples, matching the data based on the closest PIXL sample, and plotting the LIBS data (colored by corresponding PIXL abrasion) and the PIXL abrasions on a ternary plot. The goal here is to analyze how different groups of LIBS samples (colored by matching PIXL abrasion) differ by cation composition. Do igneous vs. sedimentary rock play a roll in composition tendencies and show a pattern? By looking at the composition of the soil in certain locations, we can compare the differences in the PIXL abrasion and relating LIBS samples for a certain area by combining the LIBS and PIXL data sets. In order to accomplish this, we will use a distance function to filter within 7 meters of distance between a PIXL sample and LIBS target (this is the maximum distance of accuracy based on NASA's information on LIBS spectroscopy machinery).

## 5.1 Data, Code, and Resources
1. peterc_finalProjectF24.Rmd (with knit pdf and html) is this notebook.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentNotebooks/Assignment08_FinalProjectNotebook/peterc_finalProjectF24.Rmd](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentNotebooks/Assignment07_DraftFinalProjectNotebook/peterc_finalProjectF24_roughdraft.Rmd)

2. supercam_libs_moc_loc.Rds which is the original LIBS data given to our research group.
[https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/supercam_libs_moc_loc.Rds](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/supercam_libs_moc_loc.Rds)


First, we set a distance variable which can be used as a slider bar in the app. Changing this variable sets the maximum distance between a PIXL target and LIBS sample for them to be classified together.
```{r}
#set distance variable which can be used as a toggle tool
distance=7
```

To prepare the data, I start by loading in the LIBS data. Then, we drop the standard deviation columns and sum of percentage columns leaving us with just the weighted composition in terms of numerical data. We also remove the scct values, as those values are the ones that are earth reference samples that Perserverance carries with it. Therefore, they will not be very relevant when plotting the LIBS data as we are focused on the cation combinations and therefore only need the weighted compositions.
```{r}
#Load in LIBS data
libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
#Drop the standard deviation features, the sum of the percentages,
#the distance, and the total frequencies
libs.df <- libs.df %>%
  select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
# Convert the points to numeric
libs.df$point <- as.numeric(libs.df$point)
libs.df[,6:13] <- sapply(libs.df[,6:13],as.numeric)
#remove the scct/reference samples
libs.df<-libs.df%>%
  filter(!(grepl("scct", target)))
#add a column to indicate the nearest pixl
libs.df<-cbind(nearestpixl=0,libs.df)
#make a dataframe of just the LIBS Lat/Long and target name and remove duplicates
libstargets.df<-libs.df[,c(1,3,4,5)]
libstargets.df<-distinct(libstargets.df)
```

Next, we will load in the PIXL data that is in StudentData as the pixl_sol_coordinates data frame includes the latitude, longitude, and sol of each PIXL sample. We only include the metadata as this is all that is necessary for the distance function (the sample name and coordinates of the sample) as well as only one PIXL sample from each abrasion. As each abrasion has 2 PIXL samples taken, they have the same latitude and longitude and therefore it is unnecessary to use both. We also remove the atmospheric sample.
```{r}
#read in pixl data with lat/long
pixl.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")
#include only pixl metadata
pixl.df<-pixl.df %>%
  select(c(1,2,19,20,22))
#convert Lat/Long to numeric
pixl.df$Lat <- as.numeric(pixl.df$Lat)
pixl.df$Long <- as.numeric(pixl.df$Long)
#remove rows so we only have one sample per abrasion and remove atmospheric sample
pixl.df<-pixl.df[c(2,4,6,8,10,12,14,16),]
```

Next, we will initialize a distance variable (to indicate distance between PIXL abrasion and LIBS target) and also initialize each PIXL abrasion, which will be used to mark which PIXL abrasion the LIBS sample is closest to by using a factor of 0 or 1.
```{r}
#LIBS target data frame with distance variable as well
libstargets.df<-cbind(libstargets.df,"Distance"=0,"Bellegrade"=0,"Dourbes"=0,"Quartier"=0,"Alfalfa"=0,"ThorntonGap"=0,"BerryHollow"=0,"Novarupta"=0,"UganikIsland"=0)
```

The distance function below will calculate the difference between LIBS target and all the PIXL abrasions, and pick the smallest distance to pick the closest PIXL abrasion to that LIBS target.
```{r}
#Distance function
for(i in 1:nrow(libstargets.df)) {
    libstargets.df[i,c(6:13)]<-c(distHaversine(pixl.df[1,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[2,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[3,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[4,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[5,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[6,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[7,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[8,c(1,2)],libstargets.df[i,c(2,3)],r=3393169))

    libstargets.df[i,1]<-which.min(libstargets.df[i,c(6:13)])
    libstargets.df[i,5]<-min(libstargets.df[i,c(6:13)])
}
libstargets.df$nearestpixl<-as.factor(libstargets.df$nearestpixl)
levels(libstargets.df$nearestpixl)<-(c("Bellegrade","Dourbes","Quartier","Alfalfa","ThorntonGap","BerryHollow","Novarupta","UganikIsland"))
```

Below is another initializer for the PIXL abrasion data. This sets the variables for each PIXL abrasion name.
```{r}
#Sets each nearest PIXL variable for future use in deciding which target is closest to a LIBS sample
Bellegrade<-libstargets.df[libstargets.df$nearestpixl=="Bellegrade",]$target
Dourbes<-libstargets.df[libstargets.df$nearestpixl=="Dourbes",]$target
Quartier<-libstargets.df[libstargets.df$nearestpixl=="Quartier",]$target
Alfalfa<-libstargets.df[libstargets.df$nearestpixl=="Alfalfa",]$target
ThorntonGap<-libstargets.df[libstargets.df$nearestpixl=="ThorntonGap",]$target
BerryHollow<-libstargets.df[libstargets.df$nearestpixl=="BerryHollow",]$target
Novarupta<-libstargets.df[libstargets.df$nearestpixl=="Novarupta",]$target
UganikIsland<-libstargets.df[libstargets.df$nearestpixl=="UganikIsland",]$target
```

Next, we filter out the LIBS targets that are not within the specified distance variable. Then, we merge the LIBS data with the respective PIXL abrasion by mutating and adding an abrasion column that has the abrasion name closest to each LIBS target. We also add a column, LIBS or PIXL, which denotes if the row of data is from the PIXL and LIBS data sets. We also set up the libs.tern matrix which will format the data properly for the ternary plot. This groups the cation compositions given by Dr. Rogers to set up the ternary diagram axes.
```{r}
included.libs<-(libstargets.df%>%
  filter(Distance<meters))$target
libs.matrix <-libs.df %>%
  filter(target %in% included.libs)
#set LIBS matrix and ternary plot by adding in cation components and mutating
libs.matrix <- libs.matrix[,c(5,7:14)]
libs.tern <- as.data.frame(libs.matrix) %>%
  mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100) %>%
  select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O,TiO2))
libs.tern<-cbind("Abrasion"=0,libs.tern)
#Set what abrasion goes with the respective LIBS sample it matches with
libs.tern<-libs.tern%>%
  mutate(Abrasion = ifelse(target%in%Alfalfa,"Alfalfa",
                    ifelse(target %in% Bellegrade, "Bellegrade",
                    ifelse(target %in% BerryHollow, "BerryHollow",
                    ifelse(target %in% Dourbes, "Dourbes",
                    ifelse(target %in% Novarupta, "Novarupta",
                    ifelse(target %in% Quartier, "Quartier",
                    ifelse(target %in% ThorntonGap, "ThorntonGap",
                    ifelse(target %in% UganikIsland, "UganikIsland",Abrasion)))))))))
#summary of LIBS data including distance parameter, number of LIBS targets, and number of LIBS points
kabledf<-rbind("Distance (m)"=meters,"Targets"=length(included.libs),"Points"=nrow(libs.tern))
kable(kabledf, caption ="LIBS # of Targets and Points within Specified Distance")
```
Next, we will set up a ternary data frame for the PIXL data to add some reference points on top of the LIBS data. This should provide more insight into how the composition of PIXL and LIBS samples relate by their cation combinations

```{r}
# PIXL data added
pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds")
pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)],
as.factor)
pixl.df <- pixl.df[2:16,] #Excluding first, atmospheric sample
#PIXL data, with identically reflected compositions
new_pixl_trim <- pixl.df %>%
dplyr::select(c("Na20","Mgo","Al203","Si02", "K20","Cao","FeO-T", campaign, type)) %>%
rename("Na2O"="Na20","MgO"="Mgo","Al2O3"="Al203","SiO2"="Si02","K2O"="K20",
"CaO"="Cao","FeOT"="FeO-T")
#take the sums of the specific elements, and rename type column
pixl_ternary <- new_pixl_trim %>%
mutate(x=(SiO2+Al2O3)/100,y=(FeOT+MgO)/100,z=(CaO+Na2O+K2O)/100, PIXL_Rock_Type = type) %>%
select(-c(SiO2,Al2O3,FeOT,MgO,CaO,Na2O,K2O)) %>%
drop_na()
#This is for the labels on the Ternary Plot below
pixl_ternary <- cbind(pixl_ternary, Sample_display=
c("2","3","4,6,7","5,8,9","","","","",
"10,11","","12,13","","14,15","","16"))
```


## 5.2 Contribution

This work was also a combination of Margo and I. The data set creation was both of us in our brainstorming as this utilizes the data set we created of latitude and longitude for PIXL. The setting up of this plot was also collaborative between Margo and myself, we worked together to debug and set up how to best format the data for this ternary plot.

## 5.3 Methods Description

For this ternary representation, we took the time to remove the SCCT (earth reference data) from the LIBS data, remove the atmospheric sample from the PIXL data, use a distance function to match PIXL abrasions to corresponding LIBS targets within 7 meters of the respective abrasion (or whatever the specified distance) and set up a ternary data frame with cation combinations. Then, the data frame is mutated to make the PIXL abrasions as the key (color LIBS targets by related PIXL abrasion) and plotted on the ternary diagram. We also plot the PIXL samples sorted by igneous or sedimentary as stated by the PIXL data set. This can help show the mineral evolution from igneous to sedimentary and which PIXL abrasions correlate to this.

## 5.4 Result and Discussion
Using all of the manipulation done for the creation of the ternary plot, we then plot using the ggtern command. We will color by abrasion to see the distribution of composition between different abrasions. This should help us be able to draw different conclusions about how abrasions relate or don't relate. The max distance between the PIXL target and LIBS sample can be modified however desired.
```{r}
ggtern(libs.tern, ggtern::aes(x=x,y=y,z=z)) +
  geom_point(data=libs.tern,aes(color=Abrasion,alpha=0.5)) +
  theme_rgbw() +
  labs(title=paste("Mars LIBS Data Within",distance,"meters of PIXL",sep=" "),
       x="Si+Al",
       y="Fe+Mg",
       z="Ca+Na+K",
       caption = "LIBS samples that are within 7 meters of a PIXL abrasion. \nEach LIBS sample is colored by what PIXL abrasion it matches with.")+
  theme(legend.position="right",
    plot.caption = element_text(hjust = 0)  # Aligns caption to the left
  ) +
  guides(alpha="none")+
  suppressWarnings(geom_point(
    data=pixl_ternary, ggtern::aes(cluster=PIXL_Rock_Type, shape=PIXL_Rock_Type), size = 2)) +
#Add labels to PIXL data corresponding to sample number
  suppressWarnings(geom_text(data=pixl_ternary, ggtern::aes(x=x, y=y, z=z, label=Sample_display, cluster=PIXL_Rock_Type,  # Horizontal adjust to avoid overlap
    hjust = ifelse(x > 0.43, 1, -0.1),
    vjust = ifelse(x == 0.3668, 1.3,
        ifelse(x == 0.375, 1, ifelse(x > 0.43, 1.5, -0.3))),
    fontface="bold"),
    size=2.7))
```


## 5.5 Conclusions and Future Work
Based on this ternary plot, we can see Alfalfa and Bellegrade are higher in Si+Al and Uganik Island is an outlier. We can also see the rock evolution from igneous (black circles) to sedimentary (black triangles) as we move from higher Si+Al (over 80 %) to low in Si+Al and high in Fe+Mg. This conclusion was commented on by Andrew Steele of the Carnegie Institute of Science. As this was the last piece of the PIXL data in the data set and it was missing a pair since every other abrasion was made up of two samples, it is included in here but until the data set is updated there is not enough context to explain why it is so vastly different. I would assume it is due to how the robot is traveling and the location of the Uganik Island abrasion is very different than the other 7 abrasions. Future work could include diving deeper into the evolution process of rocks (igneous abrasions Alfalfa and Dourbes to sedimentary abrasions Thornton Gap).

# Bibliography

* “Mars Rock Samples - NASA Science.” NASA, science.nasa.gov/mission/mars-2020-
perseverance/mars-rock-samples/.
* “Analyst Notebook”, https://an.rsl.wustl.edu/m20/AN/account/login.aspx.

# Appendix

Here, I will display some of the earlier soil composition plots for different PIXL abrasions.

```{r, include=FALSE}
#Earth quartiles
earthquartiles.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/LIBS_training_set_quartiles.Rds")
#Load in LIBS data
libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
#Drop the standard deviation features, the sum of the percentages,
#the distance, and the total frequencies
libs.df <- libs.df %>%
  select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
# Convert the points to numeric
libs.df$point <- as.numeric(libs.df$point)
libs.df[,6:13] <- sapply(libs.df[,6:13],as.numeric)
#remove the scct/reference samples
libs.df<-libs.df%>%
  filter(!(grepl("scct", target)))
#add a column to indicate the nearest pixl
libs.df<-cbind(nearestpixl=0,libs.df)
#make a dataframe of just the LIBS Lat/Long and target name and remove duplicates
libstargets.df<-libs.df[,c(1,3,4,5)]
libstargets.df<-distinct(libstargets.df)
```

```{r, include=FALSE}
#Choose max distance variable between PIXL and LIBS data
meters = 7
#Choose PIXL abrasion you want to look at
abrasion_name = "Alfalfa"
```

```{r, include=FALSE}
#read in pixl data with lat/long
pixl.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")
#include only pixl metadata
pixl.df<-pixl.df %>%
  select(c(1,2,19,20,22))
#convert Lat/Long to numeric
pixl.df$Lat <- as.numeric(pixl.df$Lat)
pixl.df$Long <- as.numeric(pixl.df$Long)
#remove rows so we only have one sample per abrasion and remove atmospheric sample
pixl.df<-pixl.df[c(2,4,6,8,10,12,14,16),]
```

```{r, include=FALSE}
libstargets.df<-cbind(libstargets.df,"Distance"=0,"Bellegrade"=0,"Dourbes"=0,"Quartier"=0,"Alfalfa"=0,"ThorntonGap"=0,"Berry Hollow"=0,"Novarupta"=0,"Uganik Island"=0)
```

```{r, include=FALSE}
for(i in 1:nrow(libstargets.df)) {
    libstargets.df[i,c(6:13)]<-c(distHaversine(pixl.df[1,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[2,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[3,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[4,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[5,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[6,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[7,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[8,c(1,2)],libstargets.df[i,c(2,3)],r=3393169))

    libstargets.df[i,1]<-which.min(libstargets.df[i,c(6:13)])
    libstargets.df[i,5]<-min(libstargets.df[i,c(6:13)])
}
libstargets.df$nearestpixl<-as.factor(libstargets.df$nearestpixl)
levels(libstargets.df$nearestpixl)<-(c("Bellegrade","Dourbes","Quartier","Alfalfa","ThorntonGap","Berry Hollow","Novarupta","Uganik Island"))
```

```{r, include=FALSE}
Bellegrade<-libstargets.df[libstargets.df$nearestpixl=="Bellegrade",]$target
Dourbes<-libstargets.df[libstargets.df$nearestpixl=="Dourbes",]$target
Quartier<-libstargets.df[libstargets.df$nearestpixl=="Quartier",]$target
Alfalfa<-libstargets.df[libstargets.df$nearestpixl=="Alfalfa",]$target
ThorntonGap<-libstargets.df[libstargets.df$nearestpixl=="ThorntonGap",]$target
BerryHollow<-libstargets.df[libstargets.df$nearestpixl=="Berry Hollow",]$target
Novarupta<-libstargets.df[libstargets.df$nearestpixl=="Novarupta",]$target
UganikIsland<-libstargets.df[libstargets.df$nearestpixl=="Uganik Island",]$target
```

```{r, include=FALSE}
included.libs<-(libstargets.df%>%
  filter(Distance<meters))$target
libs.matrix <-libs.df %>%
  filter(target %in% included.libs)
libs.matrix <- libs.matrix[,c(5,7:14)]
libs.matrix<-libs.matrix[,c(1:2,4:9,3)]
libs.matrix<-cbind("Abrasion"=0,libs.matrix)
libs.matrix<-libs.matrix%>%
  mutate(Abrasion = ifelse(target%in%Alfalfa,"Alfalfa",
                    ifelse(target %in% Bellegrade, "Bellegrade",
                    ifelse(target %in% BerryHollow, "Berry Hollow",
                    ifelse(target %in% Dourbes, "Dourbes",
                    ifelse(target %in% Novarupta, "Novarupta",
                    ifelse(target %in% Quartier, "Quartier",
                    ifelse(target %in% ThorntonGap, "ThorntonGap",
                    ifelse(target %in% UganikIsland, "Uganik Island",Abrasion)))))))))
libs.matrix<-cbind(libsorpixl=1,libs.matrix)
```

```{r, include=FALSE}
#read in pixl data with lat/long
pixl.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")
pixl.df<-pixl.df %>%
  select(c(5:8,12:14,17,19,18,22))
#reorder pixl columns so that it matches libs data organization
pixl.df<-pixl.df[,c(11,10,4,3,8,2,6,1,5,7)]
#remove atmospheric sample
pixl.df<-pixl.df[2:16,]
pixl.df<-cbind(libsorpixl=0,pixl.df)
```

```{r, include=FALSE}
colnames(pixl.df)<-colnames(libs.matrix)
pixllibs.df<-rbind(pixl.df,libs.matrix)
```

```{r, include=FALSE}
# Earth quartiles
filtered_rows <- earthquartiles.df %>%
  filter(`Training set Quartiles` %in% c("1st", "3rd", "Med"))
earthquartiles_long <- filtered_rows %>%
  pivot_longer(cols = starts_with("SiO2"):last_col(), names_to = "Compound", values_to = "Percentage")

earthquartiles_long <- earthquartiles_long %>% rename(Quartiles = `Training set Quartiles`)
```

```{r, include=FALSE}
# Filter for the specific abrasion sample, e.g., "Alfalfa"
pixllibs_filtered <- pixllibs.df %>%
  filter(Abrasion == abrasion_name)

# Pivot the data to longer format for ggplot
pixllibs_long <- pixllibs_filtered %>%
  pivot_longer(cols = starts_with("SiO2"):last_col(), names_to = "Compound", values_to = "Percentage")

desired_order <- c("SiO2", "Al2O3", "FeOT", "MgO", "CaO", "Na2O", "K2O", "TiO2")  # Specify your custom order here
pixllibs_long$Compound <- factor(pixllibs_long$Compound, levels = desired_order)
```

```{r, include=TRUE}
# Map the PIXL/LIBS column to color and use target_name to differentiate lines
suppressWarnings(ggplot(pixllibs_long, aes(x = Compound, y = Percentage, color = as.factor(libsorpixl), group = target)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(trans='log10') +
  # Add Earth quartile lines using earthquartiles_long
  geom_line(data = earthquartiles_long, aes(x = Compound, y = Percentage, linetype = Quartiles, group = Quartiles),
            color = "black", linetype = "dotted") +
  labs(title = paste("Soil Composition for PIXL",abrasion_name,"and LIBS within", meters, "meters", sep = " "),
       x = "Chemical Compound",
       y = "Weight Percentage",
       color = "Measurement Type",
       linetype = "Quartiles",
       caption = "The chemical composition of a PIXL abrasion and the corresponding LIBS targets \n within specified distance of respective abrasion.") +
  scale_color_manual(values = c("0" = "blue", "1" = "red"), labels = c("PIXL", "LIBS")) +
  annotate("text", x = 5, y = .50, label = "1st Quartile", color = "black", hjust = 0) +
  annotate("text", x = 5, y = 2, label = "Median", color = "black", hjust = 0) +
  annotate("text", x = 5, y = 10, label = "3rd Quartile", color = "black", hjust = 0)+
  theme_minimal()+
     # Center the caption on the left side
  theme(
    plot.caption = element_text(hjust = 0)  # Aligns caption to the left
  ))

```

```{r, include=FALSE}
#Earth quartiles
earthquartiles.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/LIBS_training_set_quartiles.Rds")
#Load in LIBS data
libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
#Drop the standard deviation features, the sum of the percentages,
#the distance, and the total frequencies
libs.df <- libs.df %>%
  select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
# Convert the points to numeric
libs.df$point <- as.numeric(libs.df$point)
libs.df[,6:13] <- sapply(libs.df[,6:13],as.numeric)
#remove the scct/reference samples
libs.df<-libs.df%>%
  filter(!(grepl("scct", target)))
#add a column to indicate the nearest pixl
libs.df<-cbind(nearestpixl=0,libs.df)
#make a dataframe of just the LIBS Lat/Long and target name and remove duplicates
libstargets.df<-libs.df[,c(1,3,4,5)]
libstargets.df<-distinct(libstargets.df)
```

```{r, include=FALSE}
#Choose max distance variable between PIXL and LIBS data
meters = 7
#Choose PIXL abrasion you want to look at
abrasion_name = "Quartier"
```

```{r, include=FALSE}
#read in pixl data with lat/long
pixl.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")
#include only pixl metadata
pixl.df<-pixl.df %>%
  select(c(1,2,19,20,22))
#convert Lat/Long to numeric
pixl.df$Lat <- as.numeric(pixl.df$Lat)
pixl.df$Long <- as.numeric(pixl.df$Long)
#remove rows so we only have one sample per abrasion and remove atmospheric sample
pixl.df<-pixl.df[c(2,4,6,8,10,12,14,16),]
```

```{r, include=FALSE}
libstargets.df<-cbind(libstargets.df,"Distance"=0,"Bellegrade"=0,"Dourbes"=0,"Quartier"=0,"Alfalfa"=0,"ThorntonGap"=0,"Berry Hollow"=0,"Novarupta"=0,"Uganik Island"=0)
```

```{r, include=FALSE}
for(i in 1:nrow(libstargets.df)) {
    libstargets.df[i,c(6:13)]<-c(distHaversine(pixl.df[1,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[2,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[3,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[4,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[5,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[6,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[7,c(1,2)],libstargets.df[i,c(2,3)],r=3393169),
                                 distHaversine(pixl.df[8,c(1,2)],libstargets.df[i,c(2,3)],r=3393169))

    libstargets.df[i,1]<-which.min(libstargets.df[i,c(6:13)])
    libstargets.df[i,5]<-min(libstargets.df[i,c(6:13)])
}
libstargets.df$nearestpixl<-as.factor(libstargets.df$nearestpixl)
levels(libstargets.df$nearestpixl)<-(c("Bellegrade","Dourbes","Quartier","Alfalfa","ThorntonGap","Berry Hollow","Novarupta","Uganik Island"))
```

```{r, include=FALSE}
Bellegrade<-libstargets.df[libstargets.df$nearestpixl=="Bellegrade",]$target
Dourbes<-libstargets.df[libstargets.df$nearestpixl=="Dourbes",]$target
Quartier<-libstargets.df[libstargets.df$nearestpixl=="Quartier",]$target
Alfalfa<-libstargets.df[libstargets.df$nearestpixl=="Alfalfa",]$target
ThorntonGap<-libstargets.df[libstargets.df$nearestpixl=="ThorntonGap",]$target
BerryHollow<-libstargets.df[libstargets.df$nearestpixl=="Berry Hollow",]$target
Novarupta<-libstargets.df[libstargets.df$nearestpixl=="Novarupta",]$target
UganikIsland<-libstargets.df[libstargets.df$nearestpixl=="Uganik Island",]$target
```

```{r, include=FALSE}
included.libs<-(libstargets.df%>%
  filter(Distance<meters))$target
libs.matrix <-libs.df %>%
  filter(target %in% included.libs)
libs.matrix <- libs.matrix[,c(5,7:14)]
libs.matrix<-libs.matrix[,c(1:2,4:9,3)]
libs.matrix<-cbind("Abrasion"=0,libs.matrix)
libs.matrix<-libs.matrix%>%
  mutate(Abrasion = ifelse(target%in%Alfalfa,"Alfalfa",
                    ifelse(target %in% Bellegrade, "Bellegrade",
                    ifelse(target %in% BerryHollow, "Berry Hollow",
                    ifelse(target %in% Dourbes, "Dourbes",
                    ifelse(target %in% Novarupta, "Novarupta",
                    ifelse(target %in% Quartier, "Quartier",
                    ifelse(target %in% ThorntonGap, "ThorntonGap",
                    ifelse(target %in% UganikIsland, "Uganik Island",Abrasion)))))))))
libs.matrix<-cbind(libsorpixl=1,libs.matrix)
```

```{r, include=FALSE}
#read in pixl data with lat/long
pixl.df<-readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")
pixl.df<-pixl.df %>%
  select(c(5:8,12:14,17,19,18,22))
#reorder pixl columns so that it matches libs data organization
pixl.df<-pixl.df[,c(11,10,4,3,8,2,6,1,5,7)]
#remove atmospheric sample
pixl.df<-pixl.df[2:16,]
pixl.df<-cbind(libsorpixl=0,pixl.df)
```

```{r, include=FALSE}
colnames(pixl.df)<-colnames(libs.matrix)
pixllibs.df<-rbind(pixl.df,libs.matrix)
```

```{r, include=FALSE}
# Earth quartiles
filtered_rows <- earthquartiles.df %>%
  filter(`Training set Quartiles` %in% c("1st", "3rd", "Med"))
earthquartiles_long <- filtered_rows %>%
  pivot_longer(cols = starts_with("SiO2"):last_col(), names_to = "Compound", values_to = "Percentage")

earthquartiles_long <- earthquartiles_long %>% rename(Quartiles = `Training set Quartiles`)
```

```{r, include=FALSE}
# Filter for the specific abrasion sample, e.g., "Alfalfa"
pixllibs_filtered <- pixllibs.df %>%
  filter(Abrasion == abrasion_name)

# Pivot the data to longer format for ggplot
pixllibs_long <- pixllibs_filtered %>%
  pivot_longer(cols = starts_with("SiO2"):last_col(), names_to = "Compound", values_to = "Percentage")

desired_order <- c("SiO2", "Al2O3", "FeOT", "MgO", "CaO", "Na2O", "K2O", "TiO2")  # Specify your custom order here
pixllibs_long$Compound <- factor(pixllibs_long$Compound, levels = desired_order)
```

```{r, include=TRUE}
# Map the PIXL/LIBS column to color and use target_name to differentiate lines
suppressWarnings(ggplot(pixllibs_long, aes(x = Compound, y = Percentage, color = as.factor(libsorpixl), group = target)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(trans='log10') +
  # Add Earth quartile lines using earthquartiles_long
  geom_line(data = earthquartiles_long, aes(x = Compound, y = Percentage, linetype = Quartiles, group = Quartiles),
            color = "black", linetype = "dotted") +
  labs(title = paste("Soil Composition for PIXL",abrasion_name,"and LIBS within", meters, "meters", sep = " "),
       x = "Chemical Compound",
       y = "Weight Percentage",
       color = "Measurement Type",
       linetype = "Quartiles",
       caption = "The chemical composition of a PIXL abrasion and the corresponding LIBS targets \n within specified distance of respective abrasion.") +
  scale_color_manual(values = c("0" = "blue", "1" = "red"), labels = c("PIXL", "LIBS")) +
  annotate("text", x = 5, y = .50, label = "1st Quartile", color = "black", hjust = 0) +
  annotate("text", x = 5, y = 2, label = "Median", color = "black", hjust = 0) +
  annotate("text", x = 5, y = 10, label = "3rd Quartile", color = "black", hjust = 0)+
  theme_minimal()+
     # Center the caption on the left side
  theme(
    plot.caption = element_text(hjust = 0)  # Aligns caption to the left
  ))

```