diff --git a/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.Rmd b/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.Rmd new file mode 100644 index 0000000..eae717d --- /dev/null +++ b/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.Rmd @@ -0,0 +1,538 @@ +--- +title: "Mars 2020 Mission Data Notebook:" +subtitle: "DAR Assignment 2 (Fall 2024)" +author: "Yashas Balaji" +date: "`r format(Sys.time(), '%d %B %Y')`" +output: + pdf_document: default + html_document: + toc: true + number_sections: true + df_print: paged +--- +```{r setup, include=FALSE} + +# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!! +# This section install packages if they are not already installed. +# This block will not be shown in the knit file. +knitr::opts_chunk$set(echo = TRUE) + +# Set the default CRAN repository +local({r <- getOption("repos") + r["CRAN"] <- "http://cran.r-project.org" + options(repos=r) +}) + +if (!require("pandoc")) { + install.packages("pandoc") + library(pandoc) +} + +# Required packages for M20 LIBS analysis +if (!require("rmarkdown")) { + install.packages("rmarkdown") + library(rmarkdown) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("stringr")) { + install.packages("stringr") + library(stringr) +} + +if (!require("ggbiplot")) { + install.packages("ggbiplot") + library(ggbiplot) +} + +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} + +``` + +# DAR ASSIGNMENT 2 (Introduction): Introductory DAR Notebook + +This notebook is broken into two main parts: + +* **Part 1:** Preparing your local repo for **DAR Assignment 2** +* **Part 2:** Loading and some analysis of the Mars 2020 (M20) Datasets + * Lithology: _Summarizes the mineral characteristics of samples collected at certain sample locations._ + * PIXL: Planetary Instrument for X-ray Lithochemistry. _Measures elemental chemistry of samples at sub-millimeter scales of samples._ + * SHERLOC: Scanning Habitable Environments with Raman and Luminescence for Organics and Chemicals. _Uses cameras, a spectrometer, and a laser of samples to search for organic compounds and minerals that have been altered in watery environments and may be signs of past microbial life._ + * LIBS: Laser-induced breakdown spectroscopy. _Uses a laser beam to help identify minerals in samples and other areas that are beyond the reach of the rover's robotic arm or in areas too steep for the rover to travel._ + +* **Part 3:** Individual analysis of your team's dataset + +**NOTE:** The RPI github repository for all the code and data required for this notebook may be found at: + +* https://github.rpi.edu/DataINCITE/DAR-Mars-F24 + +* **Part 4:** Preparation of Team Presentation + +# DAR ASSIGNMENT 2 (Part 1): Preparing your local repo for Assignment 2 + +In this assignment you'll start by making a copy of the Assignment 2 template notebook, then you'll add to your copy with your original work. The instructions which follow explain how to accomplish this. + +**NOTE:** You already cloned the `DAR-Mars-F24` repository for Assignment 1; you **do not** need to make another clone of the repo, but you must begin by updating your copy as instructed below: + +## Updating your local clone of the `DAR-Mars-F24` repository + +* Access RStudio Server on the IDEA Cluster at http://lp01.idea.rpi.edu/rstudio-ose/ + * REMINDER: You must be on the RPI VPN!! +* Access the Linux shell on the IDEA Cluster by clicking the **Terminal** tab of RStudio Server (lower left panel). + * You now see the Linux shell on the IDEA Cluster + * `cd` (change directory) to enter your home directory using: `cd ~` + * Type `pwd` to confirm where you are +* In the Linux shell, `cd` to `DAR-Mars-F24` + * Type `git pull origin main` to pull any updates + * Always do this when you being work; we might have added or changed something! +* In the Linux shell, `cd` into `Assignment02` + * Type `ls -al` to list the current contents + * Don't be surprised if you see many files! +* In the Linux shell, type `git branch` to verify your current working branch + * If it is not `dar-yourrcs`, type `git checkout dar-yourrcs` (where `yourrcs` is your RCS id) + * Re-type `git branch` to confirm +* Now in the RStudio Server UI, navigate to the `DAR-Mars-F24/StudentNotebooks/Assignment02` directory via the **Files** panel (lower right panel) + * Under the **More** menu, set this to be your R working directory + * Setting the correct working directory is essential for interactive R use! + +You're now ready to start coding Assignment 2! + +## Creating your copy of the Assignment 2 notebook + +1. In RStudio, make a **copy** of `dar-f24-assignment2-template.Rmd` file using a *new, original, descriptive* filename that **includes your RCS ID!** + * Open `dar-f24-assignment2-template.Rmd` + * **Save As...** using a new filename that includes your RCS ID + * Example filename for user `erickj4`: `erickj4-assignment2-f24.Rmd` + * POINTS OFF IF: + * You don't create a new filename! + * You don't include your RCS ID! + * You include `template` in your new filename! +2. Edit your new notebook using RStudio and save + * Change the `title:` and `subtitle:` headers (at the top of the file) + * Change the `author:` + * Don't bother changing the `date:`; it should update automagically... + * **Save** your changes +3. Use the RStudio `Knit` command to create an HTML file; repeat as necessary + * Use the down arrow next to the word `Knit` and select **Knit to HTML** + * You may also knit to PDF... +4. In the Linux terminal, use `git add` to add each new file you want to add to the repository + * Type: `git add yourfilename.Rmd` + * Type: `git add yourfilename.html` (created when you knitted) + * Add your PDF if you also created one... +5. When you're ready, in Linux commit your changes: + * Type: `git commit -m "some comment"` where "some comment" is a useful comment describing your changes + * This commits your changes to your local repo, and sets the stage for your next operation. +6. Finally, push your commits to the RPI github repo + * Type: `git push origin dar-yourrcs` (where `dar-yourrcs` is the branch you've been working in) + * Your changes are now safely on the RPI github. +7. **REQUIRED:** On the RPI github, **submit a pull request.** + * In a web browser, navigate to https://github.rpi.edu/DataINCITE/DAR-Mars-F24 + * In the branch selector drop-down (by default says **master**), select your branch + * **Submit a pull request for your branch** + * One of the DAR instructors will merge your branch, and your new files will be added to the master branch of the repo. _Do not merge your branch yourself!_ + +# DAR ASSIGNMENT 2 (Part 2): Loading the Mars 2020 (M20) Datasets + +In this assignment there are four datasets from separate instruments on the Mars Perserverance rover available for analysis: + +* **Lithology:** Summarizes the mineral characteristics of samples collected at certain sample locations +* **PIXL:** Planetary Instrument for X-ray Lithochemistry of collected samples +* **SHERLOC:** Scanning Habitable Environments with Raman and Luminescence for Organics and Chemicals for collected samples +* **LIBS:** Laser-induced breakdown spectroscopy which are measured in many areas (not just samples) + +Each dataset provides data about the mineralogy of the surface of Mars. Based on the purpose and nature of the instrument, the data is collected at different intervals along the path of Perseverance as it makes it way across the Jezero crater. Some of the data (esp. LIBS) is collected almost every Martian day, or _sol_. Some of the data (PIXL and SHERLOC) is only collected at certain sample locations of interest + +Your objective is to an analysis of the your teams dataset in order to learn all you can about these Mars samples. + +NOTES: + + * All of these datasets can be found in `/academics/MATP-4910-F24/DAR-Mars-F24/Data` + * We have included a comprehensive `samples.Rds` dataset that includes useful details about the sample locations, including Martian latitude and longitude and the sol that individual samples were collected. + * Also included is `rover.waypoints.Rds` that provides detailed location information (lat/lon) for the Perseverance rover throughout its journey, up to the present. This can be updated when necessary using the included `roverStatus-f24.R` script. + * A general guide to the available Mars 2020 data is available here: https://pds-geosciences.wustl.edu/missions/DAR-Mars2020/ + +## Data Set A: Load the Lithology Data + +The first five features of the dataset describe twenty-four (24) rover sample locations. + +The remaining features provides a simple binary (`1` or `0`) summary of presence or absence of 35 minerals at the 24 rover sample locations. + +Only the first sixteen (16) samples are maintained, as the remaining are missing the mineral descriptors. + +The following code "cleans" the dataset to prepare for analysis. It first creates a dataframe with metadata and measurements for samples, and then creates a matrix containing only numeric measurements for later analysis. + +```{r} +# Load the saved lithology data with locations added +lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds") + +# Cast samples as numbers +lithology.df$sample <- as.numeric(lithology.df$sample) + +# Convert rest into factors +lithology.df[sapply(lithology.df, is.character)] <- lapply(lithology.df[sapply(lithology.df, is.character)], + as.factor) + +# Keep only first 16 samples because the data for the rest of the samples is not available yet +lithology.df<-lithology.df[1:16,] + +# Look at summary of cleaned data frame +summary(lithology.df) + +# Create a matrix containing only the numeric measurements. The remaining features are metadata about the sample. +lithology.matrix <- sapply(lithology.df[,6:40],as.numeric)-1 + +# Review the structure of our matrix +str(lithology.matrix) +``` + + +## Data Set B: Load the PIXL Data + +The PIXL data provides summaries of the mineral compositions measured at selected sample sites by the PIXL instrument. + +```{r} +# Load the saved PIXL data with locations added +pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") + +# Convert to factors +pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], + as.factor) + +# Review our dataframe +summary(pixl.df) + +# Make the matrix of just mineral percentage measurements +pixl.matrix <- pixl.df[,2:14] + +# Review the structure +str(pixl.matrix) +``` + +## Data Set C: Load the LIBS Data + +The LIBS data provides summaries of the mineral compositions measured at selected sample sites by the LIBS instrument, part of the Perseverance SuperCam. + +```{r} +# Load the saved LIBS data with locations added +libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") + +#Drop features that are not to be used in the analysis for this notebook +libs.df <- libs.df %>% + select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev, + MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total))) + +# Convert the points to numeric +libs.df$point <- as.numeric(libs.df$point) + +# Review what we have +summary(libs.df) + +# Make the a matrix contain only the libs measurements for each mineral +libs.matrix <- as.matrix(libs.df[,6:13]) + +# Review the structure +str(libs.matrix) +``` + + + +## Dataset D: Load the SHERLOC Data + +The SHERLOC data you will be using for this lab is the result of scientists' interpretations of extensive spectral analysis of abrasion samples provided by the SHERLOC instrument. + +**NOTE:** This dataset presents minerals as rows and sample sites as columns. You'll probably want to rotate the dataset for easier analysis.... + +```{r} + +# Read in data as provided. +sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds") + +# Clean up data types +sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral) +sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], + as.numeric) +# Transform NA's to 0 +sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0) + +# Reformat data so that rows are "abrasions" and columns list the presence of minerals. +# Do this by "pivoting" to a long format, and then back to the desired wide format. + +sherloc_long <- sherloc_abrasion_raw %>% + pivot_longer(!Mineral, names_to = "Name", values_to = "Presence") + +# Make abrasion a factor +sherloc_long$Name <- as.factor(sherloc_long$Name) + +# Make it a matrix +sherloc.matrix <- sherloc_long %>% + pivot_wider(names_from = Mineral, values_from = Presence) + +# Get sample information from PIXL and add to measurements -- assumes order is the same + +sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix) + +# Review what we have +summary(sherloc.df) + +# Measurements are everything except first column +sherloc.matrix<-sherloc.matrix[,-1] + +# Sherlock measurement matrix +# Review the structure +str(sherloc.matrix) +``` +## Data Set E: PIXL + Sherloc +```{r} +# Combine PIXL and SHERLOC dataframes +pixl_sherloc.df <- cbind(pixl.df,sherloc.df ) + +# Review what we have +summary(pixl_sherloc.df) + +# Combine PIXL and SHERLOC matrices +pixl_sherloc.matrix<-cbind(pixl.matrix,sherloc.matrix) + +# Review the structure of our matrix +str(pixl_sherloc.matrix) + +``` + + +## Data Set F: PIXL + Lithography + +Create data and matrix from prior datasets + +```{r} +# Combine our PIXL and Lithology dataframes +pixl_lithology.df <- cbind(pixl.df,lithology.df ) + +# Review what we have +summary(pixl_lithology.df) + +# Combine PIXL and Lithology matrices +pixl_lithology.matrix<-cbind(pixl.matrix,lithology.matrix) + +# Review the structure +str(pixl_lithology.matrix) + +``` +```{r} +# Use our chosen 'k' to perform k-means clustering +set.seed(2) +k <- 3 +# Load necessary libraries + +# Use your dataset (replace `data` with your actual dataframe) +data <- pixl_lithology.matrix # Example dataframe + +# Load necessary libraries +library(ggplot2) +library(ggrepel) + +# Ensure data is numeric +data_numeric <- data[, sapply(data, is.numeric)] + +# Check for zero-variance columns +zero_var_cols <- sapply(data_numeric, function(x) var(x) == 0) + +# Remove zero-variance columns +if (any(zero_var_cols)) { + data_filtered <- data_numeric[, !zero_var_cols] +} else { + data_filtered <- data_numeric +} + +# Perform PCA without scaling on filtered data +pca_result <- prcomp(data_filtered, center = TRUE, scale. = FALSE) + +# Calculate explained variance for each principal component +explained_variance <- (pca_result$sdev^2) / sum(pca_result$sdev^2) + +# Format explained variance for axis labels +pc1_var <- round(explained_variance[1] * 100, 2) +pc2_var <- round(explained_variance[2] * 100, 2) + +# Extract scores (the principal component values for observations) +scores <- as.data.frame(pca_result$x) + +wssplot <- function(data, nc=15, seed=10){ + wss <- data.frame(cluster=1:nc, quality=c(0)) + for (i in 1:nc){ + set.seed(seed) + wss[i,2] <- kmeans(data, centers=i)$tot.withinss} + ggplot(data=wss,aes(x=cluster,y=quality)) + + geom_line() + + ggtitle("Quality of k-means by Cluster") +} + +# Apply `wssplot()` to our PIXL data +wssplot(data_numeric, nc=8, seed=2) +# Choose number of clusters +k <- 4 # Change this number to the number of clusters you want + +# Perform k-means clustering on PCA scores +set.seed(2) # Ensure reproducibility +kmeans_result <- kmeans(scores[, 1:2], centers = k) # Clustering on the first two PCs + +# Add the cluster labels to the scores dataframe +scores$cluster <- as.factor(kmeans_result$cluster) + +# Extract loadings (the contribution of each variable to the components) +loadings <- as.data.frame(pca_result$rotation) +loadings <- loadings * 20 # Scale the loadings for better visibility + +# Create a biplot using ggplot2, include the explained variance and cluster labels +ggplot() + + # Plot the scores (PCs for observations) and color by cluster + geom_point(aes(x = scores$PC1, y = scores$PC2, color = scores$cluster), size = 3) + + + # Add labels for observations with cluster information + geom_text_repel(aes(x = scores$PC1, y = scores$PC2, label = paste(rownames(scores), "-", scores$cluster)), color = "black") + + + # Plot the loadings (arrows for variables) + geom_segment(aes(x = 0, y = 0, xend = loadings$PC1, yend = loadings$PC2), + arrow = arrow(length = unit(0.3, "cm")), color = "red") + + + # Add labels for variables + geom_text_repel(aes(x = loadings$PC1, y = loadings$PC2, label = rownames(loadings)), color = "red") + + + # Add titles and labels, including explained variance for PC1 and PC2 + labs(x = paste0("PC1 (", pc1_var, "% Variance Explained)"), + y = paste0("PC2 (", pc2_var, "% Variance Explained)"), + title = paste("PCA Biplot with", k, "Clusters (Unscaled Data)")) + + + # Add a theme for better appearance + theme_minimal() + + + # Set color scale for clusters + scale_color_manual(values = rainbow(k)) # Use `rainbow(k)` for distinct cluster colors + + +``` + +## Data Set G: Sherloc + Lithology + +Create Data and matrix from prior datasets by taking on appropriate matrix + +```{r} +# Combine the Lithology and SHERLOC dataframes +sherloc_lithology.df <- cbind(sherloc.df,lithology.df ) + +# Review what we have +summary(sherloc_lithology.df) + +# Combine the Lithology and SHERLOC matrices +sherloc_lithology.matrix<-cbind(sherloc.matrix,lithology.matrix) + +# Review the resulting matrix +str(sherloc_lithology.matrix) + +``` + + +# Analysis of Data (Part 3) + +Each team has been assigned one of six datasets: + +1. Dataset B: PIXL: The PIXL team's goal is to understand and explain how scaling improves results from Assignment 1 + +2. Dataset C: LIBS (with appropriate scaling as necessary) + +3. Dataset D: Sherloc (with appropriate scaling as necessary) + +4. Dataset E: PIXL + Sherloc (with appropriate scaling as necessary) + +5. Dataset F: PIXL + Lithography (with appropriate scaling as necessary) + +6. Dataset G: Sherloc + Lithograpy (with appropriate scaling as necessary) + +**For each data set perform the following steps.** Feel free to use the methods/code from Assignment 1 as desired. Communicate with your teammates. Make sure that you are doing different variations of below analysis so that no team member does the exact same analysis. If you want to share clustering (which is okay but then vary rest), make sure you use the same random seeds. + +I am assigned to do an analysis on dataset F. + +1. _Describe the data set contained in the data frame and matrix:_ How many rows does it have and how many features? Which features are measurements and which features are metadata about the samples? (3 pts) + +The data set has 16 samples and 59 features. There are 48 measurements, which are comprised of chemical compounds and minerals. The metadata features are the following: (sample, name, type, campaign, location, abrasion, SampleType) + +2. _Scale this data appropriately (you can choose the scaling method):_ Explain why you chose that scaling method. (3 pts) + +I scaled the Pixl data, but left the lithography data unscaled. I chose this method because showed the best variations in the data. + +3. _Cluster the data using k-means or your favorite clustering method (like hierarchical clustering):_ Describe how you picked the best number of clusters. Indicate the number of points in each clusters. Coordinate with your team so you try different approaches. If you want to share results with your team mates, make sure to use the same random seeds. (6 pts) + +I chose the number of clusters based on the elbow test using WSS. I was between 4 and 6 clusters, but I ultimately decided on 4 because it explained the data a little better. + +4. _Perform a **creative analysis** that provides insights into what one or more of the clusters are and what they tell you about the MARS data:_ + +Cluster 4 of my data was very interesting as represented by my PCA analysis. Here we have data with high concentrations of Manganese oxide and Iron Oxide, but also has high concentrations of Aluminium Trioxide and Silicone Dioxide. This likely indicates that this region of samples are taken in a transition zone, which are comprised of a volcanic region and a waterbasin type area. This is an interesting place to take samples as it shows many different times of geological processes. + + +# Preparation of Team Presentation (Part 4) + +Prepare a presentation of your teams result to present in class on **September 11** starting at 9am in AE217 (20 pts) +The presentation should include the following elements + +1. A **Description** of the data set that you analyzed including how many observations and how many features. (<= 1.5 mins) +2. Each team member gets **three minutes** to explain their analysis: + * what analysis they performed + * the results of that analysis + * a brief discussion of their interpretation of these results + * <= 18 mins _total!_ +3. A **Conclusion** slide indicating major findings of the teams (<= 1.5 mins) +4. Thoughts on **potential next steps** for the MARS team (<= 1.5 mins) + +* A template for your team presentation is included here: https://bit.ly/dar-template-f24 + +* The rubric for the presentation is here: + +https://docs.google.com/document/d/1-4o1O4h2r8aMjAplmE-ItblQnyDAKZwNs5XCnmwacjs/pub + + + + + + + +# When you're done: SAVE, COMMIT and PUSH YOUR CHANGES! + +When you are satisfied with your edits and your notebook knits successfully, remember to push your changes to the repo using the following steps: + +* `git branch` + * To double-check that you are in your working branch +* `git add ` +* `git commit -m "Some useful comments"` +* `git push origin ` + +# Prepare group presentation + +Prepare a (at most) _three-slide_ presentation of your classification results and creative analysis. Create a joint presentation with your teammates using the Google Slides template available here: https://bit.ly/45twtUP (copy the template and customize with your content) + +Prepare a conclusion slide that summarizes all your results. + +Be prepared to present your results on xx Sep 2024 in class! + +# APPENDIX: Accessing RStudio Server on the IDEA Cluster + +The IDEA Cluster provides seven compute nodes (4x 48 cores, 3x 80 cores, 1x storage server) + +* The Cluster requires RCS credentials, enabled via registration in class + * email John Erickson for problems `erickj4@rpi.edu` +* RStudio, Jupyter, MATLAB, GPUs (on two nodes); lots of storage and computes +* Access via RPI physical network or VPN only + +# More info about Rstudio on our Cluster + +## RStudio GUI Access: + +* Use: + * http://lp01.idea.rpi.edu/rstudio-ose/ + * http://lp01.idea.rpi.edu/rstudio-ose-3/ + * http://lp01.idea.rpi.edu/rstudio-ose-6/ + * http://lp01.idea.rpi.edu/rstudio-ose-7/ +* Linux terminal accessible from within RStudio "Terminal" or via ssh (below) + diff --git a/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.html b/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.html new file mode 100644 index 0000000..293fac0 --- /dev/null +++ b/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.html @@ -0,0 +1,3173 @@ + + + + + + + + + + + + + + + +Mars 2020 Mission Data Notebook: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + +
+

1 DAR ASSIGNMENT 2 +(Introduction): Introductory DAR Notebook

+

This notebook is broken into two main parts:

+
    +
  • Part 1: Preparing your local repo for DAR +Assignment 2
  • +
  • Part 2: Loading and some analysis of the Mars 2020 +(M20) Datasets +
      +
    • Lithology: Summarizes the mineral characteristics of samples +collected at certain sample locations.
    • +
    • PIXL: Planetary Instrument for X-ray Lithochemistry. Measures +elemental chemistry of samples at sub-millimeter scales of samples. +
        +
      • SHERLOC: Scanning Habitable Environments with Raman and Luminescence +for Organics and Chemicals. Uses cameras, a spectrometer, and a +laser of samples to search for organic compounds and minerals that have +been altered in watery environments and may be signs of past microbial +life.
      • +
    • +
    • LIBS: Laser-induced breakdown spectroscopy. Uses a laser beam to +help identify minerals in samples and other areas that are beyond the +reach of the rover’s robotic arm or in areas too steep for the rover to +travel.
    • +
  • +
  • Part 3: Individual analysis of your team’s +dataset
  • +
+

NOTE: The RPI github repository for all the code and +data required for this notebook may be found at:

+ +
+
+

2 DAR ASSIGNMENT 2 (Part +1): Preparing your local repo for Assignment 2

+

In this assignment you’ll start by making a copy of the Assignment 2 +template notebook, then you’ll add to your copy with your original work. +The instructions which follow explain how to accomplish this.

+

NOTE: You already cloned the +DAR-Mars-F24 repository for Assignment 1; you do +not need to make another clone of the repo, but you must begin +by updating your copy as instructed below:

+
+

2.1 Updating your local +clone of the DAR-Mars-F24 repository

+
    +
  • Access RStudio Server on the IDEA Cluster at http://lp01.idea.rpi.edu/rstudio-ose/ +
      +
    • REMINDER: You must be on the RPI VPN!!
    • +
  • +
  • Access the Linux shell on the IDEA Cluster by clicking the +Terminal tab of RStudio Server (lower left panel). +
      +
    • You now see the Linux shell on the IDEA Cluster
    • +
    • cd (change directory) to enter your home directory +using: cd ~
    • +
    • Type pwd to confirm where you are
    • +
  • +
  • In the Linux shell, cd to DAR-Mars-F24 +
      +
    • Type git pull origin main to pull any updates
    • +
    • Always do this when you being work; we might have added or changed +something!
    • +
  • +
  • In the Linux shell, cd into Assignment02 +
      +
    • Type ls -al to list the current contents
    • +
    • Don’t be surprised if you see many files!
    • +
  • +
  • In the Linux shell, type git branch to verify your +current working branch +
      +
    • If it is not dar-yourrcs, type +git checkout dar-yourrcs (where yourrcs is +your RCS id)
    • +
    • Re-type git branch to confirm
    • +
  • +
  • Now in the RStudio Server UI, navigate to the +DAR-Mars-F24/StudentNotebooks/Assignment02 directory via +the Files panel (lower right panel) +
      +
    • Under the More menu, set this to be your R working +directory
    • +
    • Setting the correct working directory is essential for interactive R +use!
    • +
  • +
+

You’re now ready to start coding Assignment 2!

+
+
+

2.2 Creating your copy of +the Assignment 2 notebook

+
    +
  1. In RStudio, make a copy of +dar-f24-assignment2-template.Rmd file using a new, +original, descriptive filename that includes your RCS +ID! +
      +
    • Open dar-f24-assignment2-template.Rmd
    • +
    • Save As… using a new filename that includes your +RCS ID
    • +
    • Example filename for user erickj4: +erickj4-assignment2-f24.Rmd
    • +
    • POINTS OFF IF: +
        +
      • You don’t create a new filename!
      • +
      • You don’t include your RCS ID!
      • +
      • You include template in your new filename!
      • +
    • +
  2. +
  3. Edit your new notebook using RStudio and save +
      +
    • Change the title: and subtitle: headers +(at the top of the file)
    • +
    • Change the author:
    • +
    • Don’t bother changing the date:; it should update +automagically…
    • +
    • Save your changes
    • +
  4. +
  5. Use the RStudio Knit command to create an HTML file; +repeat as necessary +
      +
    • Use the down arrow next to the word Knit and select +Knit to HTML
    • +
    • You may also knit to PDF…
    • +
  6. +
  7. In the Linux terminal, use git add to add each new file +you want to add to the repository +
      +
    • Type: git add yourfilename.Rmd
    • +
    • Type: git add yourfilename.html (created when you +knitted)
    • +
    • Add your PDF if you also created one…
    • +
  8. +
  9. When you’re ready, in Linux commit your changes: +
      +
    • Type: git commit -m "some comment" where “some comment” +is a useful comment describing your changes
    • +
    • This commits your changes to your local repo, and sets the stage for +your next operation.
    • +
  10. +
  11. Finally, push your commits to the RPI github repo +
      +
    • Type: git push origin dar-yourrcs (where +dar-yourrcs is the branch you’ve been working in)
    • +
    • Your changes are now safely on the RPI github.
    • +
  12. +
  13. REQUIRED: On the RPI github, submit a pull +request. +
      +
    • In a web browser, navigate to https://github.rpi.edu/DataINCITE/DAR-Mars-F24
    • +
    • In the branch selector drop-down (by default says +master), select your branch
    • +
    • Submit a pull request for your branch
    • +
    • One of the DAR instructors will merge your branch, and your new +files will be added to the master branch of the repo. Do not merge +your branch yourself!
    • +
  14. +
+
+
+
+

3 DAR ASSIGNMENT 2 (Part +2): Loading the Mars 2020 (M20) Datasets

+

In this assignment there are four datasets from separate instruments +on the Mars Perserverance rover available for analysis:

+
    +
  • Lithology: Summarizes the mineral characteristics +of samples collected at certain sample locations
  • +
  • PIXL: Planetary Instrument for X-ray Lithochemistry +of collected samples
  • +
  • SHERLOC: Scanning Habitable Environments with Raman +and Luminescence for Organics and Chemicals for collected samples
  • +
  • LIBS: Laser-induced breakdown spectroscopy which +are measured in many areas (not just samples)
  • +
+

Each dataset provides data about the mineralogy of the surface of +Mars. Based on the purpose and nature of the instrument, the data is +collected at different intervals along the path of Perseverance as it +makes it way across the Jezero crater. Some of the data (esp. LIBS) is +collected almost every Martian day, or sol. Some of the data +(PIXL and SHERLOC) is only collected at certain sample locations of +interest

+

Your objective is to an analysis of the your teams dataset in order +to learn all you can about these Mars samples.

+

NOTES:

+
    +
  • All of these datasets can be found in +/academics/MATP-4910-F24/DAR-Mars-F24/Data
  • +
  • We have included a comprehensive samples.Rds dataset +that includes useful details about the sample locations, including +Martian latitude and longitude and the sol that individual samples were +collected.
  • +
  • Also included is rover.waypoints.Rds that provides +detailed location information (lat/lon) for the Perseverance rover +throughout its journey, up to the present. This can be updated when +necessary using the included roverStatus-f24.R script.
  • +
  • A general guide to the available Mars 2020 data is available here: +https://pds-geosciences.wustl.edu/missions/DAR-Mars2020/
  • +
+
+

3.1 Data Set A: Load the +Lithology Data

+

The first five features of the dataset describe twenty-four (24) +rover sample locations.

+

The remaining features provides a simple binary (1 or +0) summary of presence or absence of 35 minerals at the 24 +rover sample locations.

+

Only the first sixteen (16) samples are maintained, as the remaining +are missing the mineral descriptors.

+

The following code “cleans” the dataset to prepare for analysis. It +first creates a dataframe with metadata and measurements for samples, +and then creates a matrix containing only numeric measurements for later +analysis.

+
# Load the saved lithology data with locations added
+lithology.df<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/mineral_data_static.Rds")
+
+# Cast samples as numbers
+lithology.df$sample <- as.numeric(lithology.df$sample)
+
+# Convert rest into factors
+lithology.df[sapply(lithology.df, is.character)] <- lapply(lithology.df[sapply(lithology.df, is.character)], 
+                                       as.factor)
+
+# Keep only first 16 samples because the data for the rest of the samples is not available yet
+lithology.df<-lithology.df[1:16,]
+
+# Look at summary of cleaned data frame
+summary(lithology.df)
+
##      sample              name          SampleType         campaign
+##  Min.   : 1.00   Atsah     : 1   atmospheric: 1   Crater Floor:9  
+##  1st Qu.: 4.75   Bearwallow: 1   regolith   : 0   Delta Front :7  
+##  Median : 8.50   Coulettes : 1   rock core  :15   Margin Unit :0  
+##  Mean   : 8.50   Hahonih   : 1                                    
+##  3rd Qu.:12.25   Hazeltop  : 1                                    
+##  Max.   :16.00   Kukaklek  : 1                                    
+##                  (Other)   :10                                    
+##          abrasion feldspar plagioclase pyroxene olivine quartz apatite
+##  Alfalfa     :2   0:14     0:13        0: 5     0: 6    0:14   0:13   
+##  Bellegarde  :2   1: 2     1: 3        1:11     1:10    1: 2   1: 3   
+##  Berry Hollow:2                                                       
+##  Dourbes     :2                                                       
+##  Novarupta   :2                                                       
+##  Quartier    :2                                                       
+##  (Other)     :4                                                       
+##  FeTi_Oxides Iron_Oxide Sulfate Perchlorates Phosphate Ca_Sulfate Carbonate
+##  0:13        0:9        0: 4    0:15         0:11      0:10       0: 1     
+##  1: 3        1:7        1:12    1: 1         1: 5      1: 6       1:15     
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##  Fe_Mg_clay Fe_Mg_carbonate Mg_sulfate Phyllosilicates Chlorite Halite
+##  0:13       0:14            0:13       0:12            0:14     0:13  
+##  1: 3       1: 2            1: 3       1: 4            1: 2     1: 3  
+##                                                                       
+##                                                                       
+##                                                                       
+##                                                                       
+##                                                                       
+##  Organic_matter Hydrated_Ca_Sulfate Hydrated_Sulfates Hydrated_Mg_Fe_Sulfate
+##  0: 5           0:14                0:14              0:13                  
+##  1:11           1: 2                1: 2              1: 3                  
+##                                                                             
+##                                                                             
+##                                                                             
+##                                                                             
+##                                                                             
+##  Na_Perchlorate Amorphous_Silicate Hydrated_Carbonates Disordered_Silicates
+##  0:15           0:9                0:16                0:14                
+##  1: 1           1:7                                    1: 2                
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##  Hydrated_Iron_Oxide Sulfate+Organic_Matter Other_hydrated_phases Kaolinite
+##  0:15                0:11                   0:8                   0:13     
+##  1: 1                1: 5                   1:8                   1: 3     
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##  Chromite Ilmenite Zircon/Baddeleyite Spinels
+##  0:14     0:14     0:14               0:14   
+##  1: 2     1: 2     1: 2               1: 2   
+##                                              
+##                                              
+##                                              
+##                                              
+## 
+
# Create a matrix containing only the numeric measurements.  The remaining features are metadata about the sample. 
+lithology.matrix <- sapply(lithology.df[,6:40],as.numeric)-1            
+
+# Review the structure of our matrix
+str(lithology.matrix)
+
##  num [1:16, 1:35] 0 0 0 0 0 0 0 1 1 0 ...
+##  - attr(*, "dimnames")=List of 2
+##   ..$ : NULL
+##   ..$ : chr [1:35] "feldspar" "plagioclase" "pyroxene" "olivine" ...
+
+
+

3.2 Data Set B: Load the +PIXL Data

+

The PIXL data provides summaries of the mineral compositions measured +at selected sample sites by the PIXL instrument.

+
# Load the saved PIXL data with locations added
+pixl.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds")
+
+# Convert to factors
+pixl.df[sapply(pixl.df, is.character)] <- lapply(pixl.df[sapply(pixl.df, is.character)], 
+                                       as.factor)
+
+# Review our dataframe
+summary(pixl.df)
+
##      sample           Na20            Mgo             Al203       
+##  Min.   : 1.00   Min.   :1.000   Min.   : 0.730   Min.   : 1.700  
+##  1st Qu.: 4.75   1st Qu.:1.853   1st Qu.: 2.533   1st Qu.: 2.220  
+##  Median : 8.50   Median :1.900   Median :12.800   Median : 3.710  
+##  Mean   : 8.50   Mean   :2.672   Mean   :11.682   Mean   : 5.072  
+##  3rd Qu.:12.25   3rd Qu.:4.500   3rd Qu.:19.100   3rd Qu.: 7.117  
+##  Max.   :16.00   Max.   :5.550   Max.   :22.700   Max.   :11.600  
+##                                                                   
+##       Si02            P205             S03               Cl       
+##  Min.   :22.60   Min.   :0.1000   Min.   : 0.780   Min.   :0.400  
+##  1st Qu.:31.22   1st Qu.:0.2350   1st Qu.: 1.495   1st Qu.:0.940  
+##  Median :38.85   Median :0.5250   Median : 2.600   Median :1.740  
+##  Mean   :38.55   Mean   :0.6512   Mean   : 5.562   Mean   :1.846  
+##  3rd Qu.:41.17   3rd Qu.:0.8400   3rd Qu.: 3.800   3rd Qu.:2.080  
+##  Max.   :57.10   Max.   :2.7600   Max.   :21.530   Max.   :4.500  
+##                                                                   
+##       K20              Cao             Ti02            Cr203      
+##  Min.   :0.0000   Min.   :1.500   Min.   :0.2000   Min.   :0.000  
+##  1st Qu.:0.1600   1st Qu.:2.655   1st Qu.:0.5900   1st Qu.:0.025  
+##  Median :0.2000   Median :3.120   Median :0.7000   Median :0.155  
+##  Mean   :0.5800   Mean   :3.688   Mean   :0.8194   Mean   :0.355  
+##  3rd Qu.:0.8275   3rd Qu.:4.310   3rd Qu.:0.9900   3rd Qu.:0.290  
+##  Max.   :1.9000   Max.   :7.770   Max.   :2.4900   Max.   :1.900  
+##                                                                   
+##       Mno             FeO-T               name             type  
+##  Min.   :0.1000   Min.   :13.24   Atsah     : 1   Igneous    :8  
+##  1st Qu.:0.2800   1st Qu.:16.71   Bearwallow: 1   N/A        :1  
+##  Median :0.4000   Median :23.86   Coulettes : 1   Sedimentary:7  
+##  Mean   :0.3812   Mean   :21.45   Hahonih   : 1                  
+##  3rd Qu.:0.4900   3rd Qu.:25.70   Hazeltop  : 1                  
+##  Max.   :0.6900   Max.   :30.05   Kukaklek  : 1                  
+##                                   (Other)   :10                  
+##          campaign    location          abrasion
+##  Crater Floor:9   01     : 1   Alfalfa     :2  
+##  Delta Front :7   02     : 1   Bellegrade  :2  
+##                   03     : 1   Berry Hollow:2  
+##                   04     : 1   Dourbes     :2  
+##                   05     : 1   Novarupta   :2  
+##                   06     : 1   Quartier    :2  
+##                   (Other):10   (Other)     :4
+
# Make the matrix of just mineral percentage measurements
+pixl.matrix <- pixl.df[,2:14]
+
+# Review the structure
+str(pixl.matrix)
+
## tibble [16 × 13] (S3: tbl_df/tbl/data.frame)
+##  $ Na20 : num [1:16] 5.55 4.67 1.93 1.87 4.5 1.87 1.87 4.5 4.5 1.8 ...
+##  $ Mgo  : num [1:16] 2.64 2.21 19.24 12.8 0.73 ...
+##  $ Al203: num [1:16] 7.56 6.97 2.42 2.36 11.6 2.36 2.36 11.6 11.6 1.7 ...
+##  $ Si02 : num [1:16] 38.3 43.8 39.4 40.3 57.1 ...
+##  $ P205 : num [1:16] 1.65 2.76 0.48 0.28 0.84 0.28 0.28 0.84 0.84 0.1 ...
+##  $ S03  : num [1:16] 2.69 3.21 0.78 1.66 1 1.66 1.66 1 1 2.6 ...
+##  $ Cl   : num [1:16] 3.4 1.48 0.66 0.94 2.08 0.94 0.94 2.08 2.08 4.5 ...
+##  $ K20  : num [1:16] 0.75 1.06 0.18 0.2 1.9 0.2 0.2 1.9 1.9 0.3 ...
+##  $ Cao  : num [1:16] 7.77 7.62 2.94 2.94 4.31 2.94 2.94 4.31 4.31 1.8 ...
+##  $ Ti02 : num [1:16] 1.47 2.49 0.37 0.99 0.59 0.99 0.99 0.59 0.59 0.2 ...
+##  $ Cr203: num [1:16] 0.03 0.01 0.26 0.29 0 0.29 0.29 0 0 0.2 ...
+##  $ Mno  : num [1:16] 0.46 0.44 0.69 0.58 0.28 0.58 0.58 0.28 0.28 0.4 ...
+##  $ FeO-T: num [1:16] 18.7 23.2 30.1 25.7 13.2 ...
+
+
+

3.3 Data Set C: Load the +LIBS Data

+

The LIBS data provides summaries of the mineral compositions measured +at selected sample sites by the LIBS instrument, part of the +Perseverance SuperCam.

+
# Load the saved LIBS data with locations added
+libs.df <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
+
+#Drop  features that are not to be used in the analysis for this notebook
+libs.df <- libs.df %>% 
+  select(!(c(distance_mm,Tot.Em.,SiO2_stdev,TiO2_stdev,Al2O3_stdev,FeOT_stdev,
+             MgO_stdev,Na2O_stdev,CaO_stdev,K2O_stdev,Total)))
+
+# Convert the points to numeric
+libs.df$point <- as.numeric(libs.df$point)
+
+# Review what we have
+summary(libs.df)
+
##       sol              lat             lon           target         
+##  Min.   :  15.0   Min.   :18.43   Min.   :77.34   Length:1932       
+##  1st Qu.: 281.0   1st Qu.:18.44   1st Qu.:77.36   Class :character  
+##  Median : 557.0   Median :18.46   Median :77.40   Mode  :character  
+##  Mean   : 565.1   Mean   :18.46   Mean   :77.40                     
+##  3rd Qu.: 872.0   3rd Qu.:18.48   3rd Qu.:77.44                     
+##  Max.   :1019.0   Max.   :18.50   Max.   :77.45                     
+##      point             SiO2            TiO2            Al2O3       
+##  Min.   : 1.000   Min.   : 0.00   Min.   :0.0000   Min.   : 0.000  
+##  1st Qu.: 3.000   1st Qu.:42.04   1st Qu.:0.0300   1st Qu.: 3.080  
+##  Median : 5.000   Median :45.80   Median :0.3200   Median : 4.925  
+##  Mean   : 5.776   Mean   :43.47   Mean   :0.3719   Mean   : 6.246  
+##  3rd Qu.: 8.000   3rd Qu.:49.23   3rd Qu.:0.6400   3rd Qu.: 8.533  
+##  Max.   :28.000   Max.   :76.12   Max.   :2.4000   Max.   :38.350  
+##       FeOT            MgO             CaO              Na2O       
+##  Min.   : 0.29   Min.   : 0.29   Min.   : 0.080   Min.   :0.0000  
+##  1st Qu.:13.27   1st Qu.: 5.72   1st Qu.: 1.830   1st Qu.:0.9775  
+##  Median :20.21   Median :12.78   Median : 3.625   Median :1.5200  
+##  Mean   :20.07   Mean   :16.47   Mean   : 4.726   Mean   :1.7600  
+##  3rd Qu.:25.45   3rd Qu.:27.83   3rd Qu.: 4.622   3rd Qu.:2.4000  
+##  Max.   :82.68   Max.   :45.21   Max.   :52.130   Max.   :7.5200  
+##       K2O         
+##  Min.   : 0.0000  
+##  1st Qu.: 0.0000  
+##  Median : 0.3000  
+##  Mean   : 0.5909  
+##  3rd Qu.: 0.7800  
+##  Max.   :34.8700
+
# Make the a matrix contain only the libs measurements for each mineral
+libs.matrix <- as.matrix(libs.df[,6:13])
+
+# Review the structure
+str(libs.matrix)
+
##  num [1:1932, 1:8] 49.7 55.8 61.2 51 48 ...
+##  - attr(*, "dimnames")=List of 2
+##   ..$ : NULL
+##   ..$ : chr [1:8] "SiO2" "TiO2" "Al2O3" "FeOT" ...
+
+
+

3.4 Dataset D: Load the +SHERLOC Data

+

The SHERLOC data you will be using for this lab is the result of +scientists’ interpretations of extensive spectral analysis of abrasion +samples provided by the SHERLOC instrument.

+

NOTE: This dataset presents minerals as rows and +sample sites as columns. You’ll probably want to rotate the dataset for +easier analysis….

+
# Read in data as provided.  
+sherloc_abrasion_raw <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/abrasions_sherloc_samples.Rds")
+
+# Clean up data types
+sherloc_abrasion_raw$Mineral<-as.factor(sherloc_abrasion_raw$Mineral)
+sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)] <- lapply(sherloc_abrasion_raw[sapply(sherloc_abrasion_raw, is.character)], 
+                                       as.numeric)
+# Transform NA's to 0
+sherloc_abrasion_raw <- sherloc_abrasion_raw %>% replace(is.na(.), 0)
+
+# Reformat data so that rows are "abrasions" and columns list the presence of minerals. 
+# Do this by "pivoting" to a long format, and then back to the desired wide format.  
+
+sherloc_long <- sherloc_abrasion_raw %>%
+  pivot_longer(!Mineral, names_to = "Name", values_to = "Presence")
+
+# Make abrasion a factor 
+sherloc_long$Name <- as.factor(sherloc_long$Name)
+
+# Make it a matrix
+sherloc.matrix <- sherloc_long %>%
+  pivot_wider(names_from = Mineral, values_from = Presence)
+
+# Get sample information from PIXL and add to measurements -- assumes order is the same
+
+sherloc.df <- cbind(pixl.df[,c("sample","type","campaign","abrasion")],sherloc.matrix)
+
+# Review what we have
+summary(sherloc.df)
+
##      sample               type           campaign         abrasion
+##  Min.   : 1.00   Igneous    :8   Crater Floor:9   Alfalfa     :2  
+##  1st Qu.: 4.75   N/A        :1   Delta Front :7   Bellegrade  :2  
+##  Median : 8.50   Sedimentary:7                    Berry Hollow:2  
+##  Mean   : 8.50                                    Dourbes     :2  
+##  3rd Qu.:12.25                                    Novarupta   :2  
+##  Max.   :16.00                                    Quartier    :2  
+##                                                   (Other)     :4  
+##          Name     Plagioclase        Sulfate         Ca-sulfate    
+##  Atsah     : 1   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
+##  Bearwallow: 1   1st Qu.:0.0000   1st Qu.:0.1875   1st Qu.:0.0000  
+##  Coulettes : 1   Median :0.0000   Median :1.0000   Median :0.0000  
+##  Hahonih   : 1   Mean   :0.1875   Mean   :0.6562   Mean   :0.3438  
+##  Hazeltop  : 1   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
+##  Kukaklek  : 1   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
+##  (Other)   :10                                                     
+##  Hydrated Ca-sulfate   Mg-sulfate     Hydrated Sulfates Hydrated Mg-Fe sulfate
+##  Min.   :0.000       Min.   :0.0000   Min.   :0.000     Min.   :0.0000        
+##  1st Qu.:0.000       1st Qu.:0.0000   1st Qu.:0.000     1st Qu.:0.0000        
+##  Median :0.000       Median :0.0000   Median :0.000     Median :0.0000        
+##  Mean   :0.125       Mean   :0.1875   Mean   :0.125     Mean   :0.1875        
+##  3rd Qu.:0.000       3rd Qu.:0.0000   3rd Qu.:0.000     3rd Qu.:0.0000        
+##  Max.   :1.000       Max.   :1.0000   Max.   :1.000     Max.   :1.0000        
+##                                                                               
+##   Perchlorates    Na-perchlorate    Amorphous Silicate   Phosphate     
+##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000     Min.   :0.0000  
+##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000     1st Qu.:0.0000  
+##  Median :0.0000   Median :0.00000   Median :0.0000     Median :0.0000  
+##  Mean   :0.0625   Mean   :0.03125   Mean   :0.1406     Mean   :0.2031  
+##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.2500     3rd Qu.:0.3125  
+##  Max.   :1.0000   Max.   :0.50000   Max.   :0.5000     Max.   :1.0000  
+##                                                                        
+##     Pyroxene         Olivine         Carbonate      Fe-Mg carbonate
+##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
+##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.4375   1st Qu.:0.000  
+##  Median :1.0000   Median :0.6250   Median :1.0000   Median :0.000  
+##  Mean   :0.6875   Mean   :0.5312   Mean   :0.7344   Mean   :0.125  
+##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.000  
+##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
+##                                                                    
+##  Hydrated Carbonates Disordered Silicates    Feldspar         Quartz       
+##  Min.   :0           Min.   :0.000        Min.   :0.000   Min.   :0.00000  
+##  1st Qu.:0           1st Qu.:0.000        1st Qu.:0.000   1st Qu.:0.00000  
+##  Median :0           Median :0.000        Median :0.000   Median :0.00000  
+##  Mean   :0           Mean   :0.125        Mean   :0.125   Mean   :0.03125  
+##  3rd Qu.:0           3rd Qu.:0.000        3rd Qu.:0.000   3rd Qu.:0.00000  
+##  Max.   :0           Max.   :1.000        Max.   :1.000   Max.   :0.25000  
+##                                                                            
+##     Apatite        FeTi oxides         Halite          Iron oxide    
+##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
+##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
+##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.0000  
+##  Mean   :0.1406   Mean   :0.1406   Mean   :0.04688   Mean   :0.2812  
+##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.5000  
+##  Max.   :1.0000   Max.   :1.0000   Max.   :0.25000   Max.   :1.0000  
+##                                                                      
+##  Hydrated Iron oxide Organic matter   Sulfate+Organic matter
+##  Min.   :0.00000     Min.   :0.0000   Min.   :0.0000        
+##  1st Qu.:0.00000     1st Qu.:0.0000   1st Qu.:0.0000        
+##  Median :0.00000     Median :1.0000   Median :0.0000        
+##  Mean   :0.01562     Mean   :0.5938   Mean   :0.2188        
+##  3rd Qu.:0.00000     3rd Qu.:1.0000   3rd Qu.:0.2500        
+##  Max.   :0.25000     Max.   :1.0000   Max.   :1.0000        
+##                                                             
+##  Other hydrated phases Phyllosilicates      Chlorite     
+##  Min.   :0.0000        Min.   :0.00000   Min.   :0.0000  
+##  1st Qu.:0.0000        1st Qu.:0.00000   1st Qu.:0.0000  
+##  Median :0.2500        Median :0.00000   Median :0.0000  
+##  Mean   :0.4375        Mean   :0.09375   Mean   :0.0625  
+##  3rd Qu.:1.0000        3rd Qu.:0.06250   3rd Qu.:0.0000  
+##  Max.   :1.0000        Max.   :0.50000   Max.   :0.5000  
+##                                                          
+##  Kaolinite (hydrous Al-clay)    Chromite        Ilmenite     Zircon/Baddeleyite
+##  Min.   :0.0000              Min.   :0.000   Min.   :0.000   Min.   :0.000     
+##  1st Qu.:0.0000              1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000     
+##  Median :0.0000              Median :0.000   Median :0.000   Median :0.000     
+##  Mean   :0.1875              Mean   :0.125   Mean   :0.125   Mean   :0.125     
+##  3rd Qu.:0.0000              3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:0.000     
+##  Max.   :1.0000              Max.   :1.000   Max.   :1.000   Max.   :1.000     
+##                                                                                
+##  Fe-Mg-clay minerals    Spinels      
+##  Min.   :0.0000      Min.   :0.0000  
+##  1st Qu.:0.0000      1st Qu.:0.0000  
+##  Median :0.0000      Median :0.0000  
+##  Mean   :0.1875      Mean   :0.0625  
+##  3rd Qu.:0.0000      3rd Qu.:0.0000  
+##  Max.   :1.0000      Max.   :0.5000  
+## 
+
# Measurements are everything except first column
+sherloc.matrix<-sherloc.matrix[,-1]
+
+# Sherlock measurement matrix
+# Review the structure 
+str(sherloc.matrix)
+
## tibble [16 × 35] (S3: tbl_df/tbl/data.frame)
+##  $ Plagioclase                : num [1:16] 1 1 1 0 0 0 0 0 0 0 ...
+##  $ Sulfate                    : num [1:16] 1 1 1 1 1 1 1 0 0 0 ...
+##  $ Ca-sulfate                 : num [1:16] 1 1 1 0 0 0 0 0 0 0 ...
+##  $ Hydrated Ca-sulfate        : num [1:16] 0 1 1 0 0 0 0 0 0 0 ...
+##  $ Mg-sulfate                 : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Hydrated Sulfates          : num [1:16] 0 0 0 0 0 1 1 0 0 0 ...
+##  $ Hydrated Mg-Fe sulfate     : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Perchlorates               : num [1:16] 1 0 0 0 0 0 0 0 0 0 ...
+##  $ Na-perchlorate             : num [1:16] 0.5 0 0 0 0 0 0 0 0 0 ...
+##  $ Amorphous Silicate         : num [1:16] 0.25 0.25 0.25 0.5 0.5 0.25 0.25 0 0 0 ...
+##  $ Phosphate                  : num [1:16] 0.25 1 1 0 0 0 0 0 0 0 ...
+##  $ Pyroxene                   : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
+##  $ Olivine                    : num [1:16] 0 0 0 1 1 1 1 0.25 0.25 1 ...
+##  $ Carbonate                  : num [1:16] 0 1 1 1 1 1 1 0.5 0.5 1 ...
+##  $ Fe-Mg carbonate            : num [1:16] 0 0 0 0 0 0 0 0 0 1 ...
+##  $ Hydrated Carbonates        : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Disordered Silicates       : num [1:16] 0 0 0 0 0 0 0 1 1 0 ...
+##  $ Feldspar                   : num [1:16] 0 0 0 0 0 0 0 1 1 0 ...
+##  $ Quartz                     : num [1:16] 0 0 0 0 0 0 0 0.25 0.25 0 ...
+##  $ Apatite                    : num [1:16] 0.25 0 0 0 0 0 0 0 0 0 ...
+##  $ FeTi oxides                : num [1:16] 0.25 1 1 0 0 0 0 0 0 0 ...
+##  $ Halite                     : num [1:16] 0.25 0 0 0 0 0 0 0 0 0.25 ...
+##  $ Iron oxide                 : num [1:16] 1 1 1 0 0 0 0 0.5 0.5 0.25 ...
+##  $ Hydrated Iron oxide        : num [1:16] 0.25 0 0 0 0 0 0 0 0 0 ...
+##  $ Organic matter             : num [1:16] 0 0 0 1 1 1 1 1 1 0 ...
+##  $ Sulfate+Organic matter     : num [1:16] 0 0 0 0 0 1 1 0 0 0 ...
+##  $ Other hydrated phases      : num [1:16] 0 0 0 1 1 1 1 0.5 0.5 1 ...
+##  $ Phyllosilicates            : num [1:16] 0 0 0 0 0 0 0 0.5 0.5 0.25 ...
+##  $ Chlorite                   : num [1:16] 0 0 0 0 0 0 0 0.5 0.5 0 ...
+##  $ Kaolinite (hydrous Al-clay): num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Chromite                   : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Ilmenite                   : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Zircon/Baddeleyite         : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Fe-Mg-clay minerals        : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+##  $ Spinels                    : num [1:16] 0 0 0 0 0 0 0 0 0 0 ...
+
+
+

3.5 Data Set E: PIXL + +Sherloc

+
# Combine PIXL and SHERLOC dataframes 
+pixl_sherloc.df <- cbind(pixl.df,sherloc.df )
+
+# Review what we have
+summary(pixl_sherloc.df)
+
##      sample           Na20            Mgo             Al203       
+##  Min.   : 1.00   Min.   :1.000   Min.   : 0.730   Min.   : 1.700  
+##  1st Qu.: 4.75   1st Qu.:1.853   1st Qu.: 2.533   1st Qu.: 2.220  
+##  Median : 8.50   Median :1.900   Median :12.800   Median : 3.710  
+##  Mean   : 8.50   Mean   :2.672   Mean   :11.682   Mean   : 5.072  
+##  3rd Qu.:12.25   3rd Qu.:4.500   3rd Qu.:19.100   3rd Qu.: 7.117  
+##  Max.   :16.00   Max.   :5.550   Max.   :22.700   Max.   :11.600  
+##                                                                   
+##       Si02            P205             S03               Cl       
+##  Min.   :22.60   Min.   :0.1000   Min.   : 0.780   Min.   :0.400  
+##  1st Qu.:31.22   1st Qu.:0.2350   1st Qu.: 1.495   1st Qu.:0.940  
+##  Median :38.85   Median :0.5250   Median : 2.600   Median :1.740  
+##  Mean   :38.55   Mean   :0.6512   Mean   : 5.562   Mean   :1.846  
+##  3rd Qu.:41.17   3rd Qu.:0.8400   3rd Qu.: 3.800   3rd Qu.:2.080  
+##  Max.   :57.10   Max.   :2.7600   Max.   :21.530   Max.   :4.500  
+##                                                                   
+##       K20              Cao             Ti02            Cr203      
+##  Min.   :0.0000   Min.   :1.500   Min.   :0.2000   Min.   :0.000  
+##  1st Qu.:0.1600   1st Qu.:2.655   1st Qu.:0.5900   1st Qu.:0.025  
+##  Median :0.2000   Median :3.120   Median :0.7000   Median :0.155  
+##  Mean   :0.5800   Mean   :3.688   Mean   :0.8194   Mean   :0.355  
+##  3rd Qu.:0.8275   3rd Qu.:4.310   3rd Qu.:0.9900   3rd Qu.:0.290  
+##  Max.   :1.9000   Max.   :7.770   Max.   :2.4900   Max.   :1.900  
+##                                                                   
+##       Mno             FeO-T               name             type  
+##  Min.   :0.1000   Min.   :13.24   Atsah     : 1   Igneous    :8  
+##  1st Qu.:0.2800   1st Qu.:16.71   Bearwallow: 1   N/A        :1  
+##  Median :0.4000   Median :23.86   Coulettes : 1   Sedimentary:7  
+##  Mean   :0.3812   Mean   :21.45   Hahonih   : 1                  
+##  3rd Qu.:0.4900   3rd Qu.:25.70   Hazeltop  : 1                  
+##  Max.   :0.6900   Max.   :30.05   Kukaklek  : 1                  
+##                                   (Other)   :10                  
+##          campaign    location          abrasion     sample               type  
+##  Crater Floor:9   01     : 1   Alfalfa     :2   Min.   : 1.00   Igneous    :8  
+##  Delta Front :7   02     : 1   Bellegrade  :2   1st Qu.: 4.75   N/A        :1  
+##                   03     : 1   Berry Hollow:2   Median : 8.50   Sedimentary:7  
+##                   04     : 1   Dourbes     :2   Mean   : 8.50                  
+##                   05     : 1   Novarupta   :2   3rd Qu.:12.25                  
+##                   06     : 1   Quartier    :2   Max.   :16.00                  
+##                   (Other):10   (Other)     :4                                  
+##          campaign         abrasion         Name     Plagioclase    
+##  Crater Floor:9   Alfalfa     :2   Atsah     : 1   Min.   :0.0000  
+##  Delta Front :7   Bellegrade  :2   Bearwallow: 1   1st Qu.:0.0000  
+##                   Berry Hollow:2   Coulettes : 1   Median :0.0000  
+##                   Dourbes     :2   Hahonih   : 1   Mean   :0.1875  
+##                   Novarupta   :2   Hazeltop  : 1   3rd Qu.:0.0000  
+##                   Quartier    :2   Kukaklek  : 1   Max.   :1.0000  
+##                   (Other)     :4   (Other)   :10                   
+##     Sulfate         Ca-sulfate     Hydrated Ca-sulfate   Mg-sulfate    
+##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000       Min.   :0.0000  
+##  1st Qu.:0.1875   1st Qu.:0.0000   1st Qu.:0.000       1st Qu.:0.0000  
+##  Median :1.0000   Median :0.0000   Median :0.000       Median :0.0000  
+##  Mean   :0.6562   Mean   :0.3438   Mean   :0.125       Mean   :0.1875  
+##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.000       3rd Qu.:0.0000  
+##  Max.   :1.0000   Max.   :1.0000   Max.   :1.000       Max.   :1.0000  
+##                                                                        
+##  Hydrated Sulfates Hydrated Mg-Fe sulfate  Perchlorates    Na-perchlorate   
+##  Min.   :0.000     Min.   :0.0000         Min.   :0.0000   Min.   :0.00000  
+##  1st Qu.:0.000     1st Qu.:0.0000         1st Qu.:0.0000   1st Qu.:0.00000  
+##  Median :0.000     Median :0.0000         Median :0.0000   Median :0.00000  
+##  Mean   :0.125     Mean   :0.1875         Mean   :0.0625   Mean   :0.03125  
+##  3rd Qu.:0.000     3rd Qu.:0.0000         3rd Qu.:0.0000   3rd Qu.:0.00000  
+##  Max.   :1.000     Max.   :1.0000         Max.   :1.0000   Max.   :0.50000  
+##                                                                             
+##  Amorphous Silicate   Phosphate         Pyroxene         Olivine      
+##  Min.   :0.0000     Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
+##  1st Qu.:0.0000     1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
+##  Median :0.0000     Median :0.0000   Median :1.0000   Median :0.6250  
+##  Mean   :0.1406     Mean   :0.2031   Mean   :0.6875   Mean   :0.5312  
+##  3rd Qu.:0.2500     3rd Qu.:0.3125   3rd Qu.:1.0000   3rd Qu.:1.0000  
+##  Max.   :0.5000     Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
+##                                                                       
+##    Carbonate      Fe-Mg carbonate Hydrated Carbonates Disordered Silicates
+##  Min.   :0.0000   Min.   :0.000   Min.   :0           Min.   :0.000       
+##  1st Qu.:0.4375   1st Qu.:0.000   1st Qu.:0           1st Qu.:0.000       
+##  Median :1.0000   Median :0.000   Median :0           Median :0.000       
+##  Mean   :0.7344   Mean   :0.125   Mean   :0           Mean   :0.125       
+##  3rd Qu.:1.0000   3rd Qu.:0.000   3rd Qu.:0           3rd Qu.:0.000       
+##  Max.   :1.0000   Max.   :1.000   Max.   :0           Max.   :1.000       
+##                                                                           
+##     Feldspar         Quartz           Apatite        FeTi oxides    
+##  Min.   :0.000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
+##  1st Qu.:0.000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000  
+##  Median :0.000   Median :0.00000   Median :0.0000   Median :0.0000  
+##  Mean   :0.125   Mean   :0.03125   Mean   :0.1406   Mean   :0.1406  
+##  3rd Qu.:0.000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.0000  
+##  Max.   :1.000   Max.   :0.25000   Max.   :1.0000   Max.   :1.0000  
+##                                                                     
+##      Halite          Iron oxide     Hydrated Iron oxide Organic matter  
+##  Min.   :0.00000   Min.   :0.0000   Min.   :0.00000     Min.   :0.0000  
+##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000     1st Qu.:0.0000  
+##  Median :0.00000   Median :0.0000   Median :0.00000     Median :1.0000  
+##  Mean   :0.04688   Mean   :0.2812   Mean   :0.01562     Mean   :0.5938  
+##  3rd Qu.:0.00000   3rd Qu.:0.5000   3rd Qu.:0.00000     3rd Qu.:1.0000  
+##  Max.   :0.25000   Max.   :1.0000   Max.   :0.25000     Max.   :1.0000  
+##                                                                         
+##  Sulfate+Organic matter Other hydrated phases Phyllosilicates  
+##  Min.   :0.0000         Min.   :0.0000        Min.   :0.00000  
+##  1st Qu.:0.0000         1st Qu.:0.0000        1st Qu.:0.00000  
+##  Median :0.0000         Median :0.2500        Median :0.00000  
+##  Mean   :0.2188         Mean   :0.4375        Mean   :0.09375  
+##  3rd Qu.:0.2500         3rd Qu.:1.0000        3rd Qu.:0.06250  
+##  Max.   :1.0000         Max.   :1.0000        Max.   :0.50000  
+##                                                                
+##     Chlorite      Kaolinite (hydrous Al-clay)    Chromite        Ilmenite    
+##  Min.   :0.0000   Min.   :0.0000              Min.   :0.000   Min.   :0.000  
+##  1st Qu.:0.0000   1st Qu.:0.0000              1st Qu.:0.000   1st Qu.:0.000  
+##  Median :0.0000   Median :0.0000              Median :0.000   Median :0.000  
+##  Mean   :0.0625   Mean   :0.1875              Mean   :0.125   Mean   :0.125  
+##  3rd Qu.:0.0000   3rd Qu.:0.0000              3rd Qu.:0.000   3rd Qu.:0.000  
+##  Max.   :0.5000   Max.   :1.0000              Max.   :1.000   Max.   :1.000  
+##                                                                              
+##  Zircon/Baddeleyite Fe-Mg-clay minerals    Spinels      
+##  Min.   :0.000      Min.   :0.0000      Min.   :0.0000  
+##  1st Qu.:0.000      1st Qu.:0.0000      1st Qu.:0.0000  
+##  Median :0.000      Median :0.0000      Median :0.0000  
+##  Mean   :0.125      Mean   :0.1875      Mean   :0.0625  
+##  3rd Qu.:0.000      3rd Qu.:0.0000      3rd Qu.:0.0000  
+##  Max.   :1.000      Max.   :1.0000      Max.   :0.5000  
+## 
+
# Combine PIXL and SHERLOC matrices
+pixl_sherloc.matrix<-cbind(pixl.matrix,sherloc.matrix)
+
+# Review the structure of our matrix
+str(pixl_sherloc.matrix)
+
## 'data.frame':    16 obs. of  48 variables:
+##  $ Na20                       : num  5.55 4.67 1.93 1.87 4.5 1.87 1.87 4.5 4.5 1.8 ...
+##  $ Mgo                        : num  2.64 2.21 19.24 12.8 0.73 ...
+##  $ Al203                      : num  7.56 6.97 2.42 2.36 11.6 2.36 2.36 11.6 11.6 1.7 ...
+##  $ Si02                       : num  38.3 43.8 39.4 40.3 57.1 ...
+##  $ P205                       : num  1.65 2.76 0.48 0.28 0.84 0.28 0.28 0.84 0.84 0.1 ...
+##  $ S03                        : num  2.69 3.21 0.78 1.66 1 1.66 1.66 1 1 2.6 ...
+##  $ Cl                         : num  3.4 1.48 0.66 0.94 2.08 0.94 0.94 2.08 2.08 4.5 ...
+##  $ K20                        : num  0.75 1.06 0.18 0.2 1.9 0.2 0.2 1.9 1.9 0.3 ...
+##  $ Cao                        : num  7.77 7.62 2.94 2.94 4.31 2.94 2.94 4.31 4.31 1.8 ...
+##  $ Ti02                       : num  1.47 2.49 0.37 0.99 0.59 0.99 0.99 0.59 0.59 0.2 ...
+##  $ Cr203                      : num  0.03 0.01 0.26 0.29 0 0.29 0.29 0 0 0.2 ...
+##  $ Mno                        : num  0.46 0.44 0.69 0.58 0.28 0.58 0.58 0.28 0.28 0.4 ...
+##  $ FeO-T                      : num  18.7 23.2 30.1 25.7 13.2 ...
+##  $ Plagioclase                : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Sulfate                    : num  1 1 1 1 1 1 1 0 0 0 ...
+##  $ Ca-sulfate                 : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Hydrated Ca-sulfate        : num  0 1 1 0 0 0 0 0 0 0 ...
+##  $ Mg-sulfate                 : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Hydrated Sulfates          : num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Hydrated Mg-Fe sulfate     : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Perchlorates               : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Na-perchlorate             : num  0.5 0 0 0 0 0 0 0 0 0 ...
+##  $ Amorphous Silicate         : num  0.25 0.25 0.25 0.5 0.5 0.25 0.25 0 0 0 ...
+##  $ Phosphate                  : num  0.25 1 1 0 0 0 0 0 0 0 ...
+##  $ Pyroxene                   : num  1 1 1 1 1 1 1 1 1 1 ...
+##  $ Olivine                    : num  0 0 0 1 1 1 1 0.25 0.25 1 ...
+##  $ Carbonate                  : num  0 1 1 1 1 1 1 0.5 0.5 1 ...
+##  $ Fe-Mg carbonate            : num  0 0 0 0 0 0 0 0 0 1 ...
+##  $ Hydrated Carbonates        : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Disordered Silicates       : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Feldspar                   : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Quartz                     : num  0 0 0 0 0 0 0 0.25 0.25 0 ...
+##  $ Apatite                    : num  0.25 0 0 0 0 0 0 0 0 0 ...
+##  $ FeTi oxides                : num  0.25 1 1 0 0 0 0 0 0 0 ...
+##  $ Halite                     : num  0.25 0 0 0 0 0 0 0 0 0.25 ...
+##  $ Iron oxide                 : num  1 1 1 0 0 0 0 0.5 0.5 0.25 ...
+##  $ Hydrated Iron oxide        : num  0.25 0 0 0 0 0 0 0 0 0 ...
+##  $ Organic matter             : num  0 0 0 1 1 1 1 1 1 0 ...
+##  $ Sulfate+Organic matter     : num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Other hydrated phases      : num  0 0 0 1 1 1 1 0.5 0.5 1 ...
+##  $ Phyllosilicates            : num  0 0 0 0 0 0 0 0.5 0.5 0.25 ...
+##  $ Chlorite                   : num  0 0 0 0 0 0 0 0.5 0.5 0 ...
+##  $ Kaolinite (hydrous Al-clay): num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Chromite                   : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Ilmenite                   : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Zircon/Baddeleyite         : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Fe-Mg-clay minerals        : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Spinels                    : num  0 0 0 0 0 0 0 0 0 0 ...
+
+
+

3.6 Data Set F: PIXL + +Lithography

+

Create data and matrix from prior datasets

+
# Combine our PIXL and Lithology dataframes
+pixl_lithology.df <- cbind(pixl.df,lithology.df )
+
+# Review what we have
+summary(pixl_lithology.df)
+
##      sample           Na20            Mgo             Al203       
+##  Min.   : 1.00   Min.   :1.000   Min.   : 0.730   Min.   : 1.700  
+##  1st Qu.: 4.75   1st Qu.:1.853   1st Qu.: 2.533   1st Qu.: 2.220  
+##  Median : 8.50   Median :1.900   Median :12.800   Median : 3.710  
+##  Mean   : 8.50   Mean   :2.672   Mean   :11.682   Mean   : 5.072  
+##  3rd Qu.:12.25   3rd Qu.:4.500   3rd Qu.:19.100   3rd Qu.: 7.117  
+##  Max.   :16.00   Max.   :5.550   Max.   :22.700   Max.   :11.600  
+##                                                                   
+##       Si02            P205             S03               Cl       
+##  Min.   :22.60   Min.   :0.1000   Min.   : 0.780   Min.   :0.400  
+##  1st Qu.:31.22   1st Qu.:0.2350   1st Qu.: 1.495   1st Qu.:0.940  
+##  Median :38.85   Median :0.5250   Median : 2.600   Median :1.740  
+##  Mean   :38.55   Mean   :0.6512   Mean   : 5.562   Mean   :1.846  
+##  3rd Qu.:41.17   3rd Qu.:0.8400   3rd Qu.: 3.800   3rd Qu.:2.080  
+##  Max.   :57.10   Max.   :2.7600   Max.   :21.530   Max.   :4.500  
+##                                                                   
+##       K20              Cao             Ti02            Cr203      
+##  Min.   :0.0000   Min.   :1.500   Min.   :0.2000   Min.   :0.000  
+##  1st Qu.:0.1600   1st Qu.:2.655   1st Qu.:0.5900   1st Qu.:0.025  
+##  Median :0.2000   Median :3.120   Median :0.7000   Median :0.155  
+##  Mean   :0.5800   Mean   :3.688   Mean   :0.8194   Mean   :0.355  
+##  3rd Qu.:0.8275   3rd Qu.:4.310   3rd Qu.:0.9900   3rd Qu.:0.290  
+##  Max.   :1.9000   Max.   :7.770   Max.   :2.4900   Max.   :1.900  
+##                                                                   
+##       Mno             FeO-T               name             type  
+##  Min.   :0.1000   Min.   :13.24   Atsah     : 1   Igneous    :8  
+##  1st Qu.:0.2800   1st Qu.:16.71   Bearwallow: 1   N/A        :1  
+##  Median :0.4000   Median :23.86   Coulettes : 1   Sedimentary:7  
+##  Mean   :0.3812   Mean   :21.45   Hahonih   : 1                  
+##  3rd Qu.:0.4900   3rd Qu.:25.70   Hazeltop  : 1                  
+##  Max.   :0.6900   Max.   :30.05   Kukaklek  : 1                  
+##                                   (Other)   :10                  
+##          campaign    location          abrasion     sample              name   
+##  Crater Floor:9   01     : 1   Alfalfa     :2   Min.   : 1.00   Atsah     : 1  
+##  Delta Front :7   02     : 1   Bellegrade  :2   1st Qu.: 4.75   Bearwallow: 1  
+##                   03     : 1   Berry Hollow:2   Median : 8.50   Coulettes : 1  
+##                   04     : 1   Dourbes     :2   Mean   : 8.50   Hahonih   : 1  
+##                   05     : 1   Novarupta   :2   3rd Qu.:12.25   Hazeltop  : 1  
+##                   06     : 1   Quartier    :2   Max.   :16.00   Kukaklek  : 1  
+##                   (Other):10   (Other)     :4                   (Other)   :10  
+##        SampleType         campaign         abrasion feldspar plagioclase
+##  atmospheric: 1   Crater Floor:9   Alfalfa     :2   0:14     0:13       
+##  regolith   : 0   Delta Front :7   Bellegarde  :2   1: 2     1: 3       
+##  rock core  :15   Margin Unit :0   Berry Hollow:2                       
+##                                    Dourbes     :2                       
+##                                    Novarupta   :2                       
+##                                    Quartier    :2                       
+##                                    (Other)     :4                       
+##  pyroxene olivine quartz apatite FeTi_Oxides Iron_Oxide Sulfate Perchlorates
+##  0: 5     0: 6    0:14   0:13    0:13        0:9        0: 4    0:15        
+##  1:11     1:10    1: 2   1: 3    1: 3        1:7        1:12    1: 1        
+##                                                                             
+##                                                                             
+##                                                                             
+##                                                                             
+##                                                                             
+##  Phosphate Ca_Sulfate Carbonate Fe_Mg_clay Fe_Mg_carbonate Mg_sulfate
+##  0:11      0:10       0: 1      0:13       0:14            0:13      
+##  1: 5      1: 6       1:15      1: 3       1: 2            1: 3      
+##                                                                      
+##                                                                      
+##                                                                      
+##                                                                      
+##                                                                      
+##  Phyllosilicates Chlorite Halite Organic_matter Hydrated_Ca_Sulfate
+##  0:12            0:14     0:13   0: 5           0:14               
+##  1: 4            1: 2     1: 3   1:11           1: 2               
+##                                                                    
+##                                                                    
+##                                                                    
+##                                                                    
+##                                                                    
+##  Hydrated_Sulfates Hydrated_Mg_Fe_Sulfate Na_Perchlorate Amorphous_Silicate
+##  0:14              0:13                   0:15           0:9               
+##  1: 2              1: 3                   1: 1           1:7               
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##  Hydrated_Carbonates Disordered_Silicates Hydrated_Iron_Oxide
+##  0:16                0:14                 0:15               
+##                      1: 2                 1: 1               
+##                                                              
+##                                                              
+##                                                              
+##                                                              
+##                                                              
+##  Sulfate+Organic_Matter Other_hydrated_phases Kaolinite Chromite Ilmenite
+##  0:11                   0:8                   0:13      0:14     0:14    
+##  1: 5                   1:8                   1: 3      1: 2     1: 2    
+##                                                                          
+##                                                                          
+##                                                                          
+##                                                                          
+##                                                                          
+##  Zircon/Baddeleyite Spinels
+##  0:14               0:14   
+##  1: 2               1: 2   
+##                            
+##                            
+##                            
+##                            
+## 
+
# Combine PIXL and Lithology matrices
+pixl_lithology.matrix<-cbind(pixl.matrix,lithology.matrix)
+
+# Review the structure
+str(pixl_lithology.matrix)
+
## 'data.frame':    16 obs. of  48 variables:
+##  $ Na20                  : num  5.55 4.67 1.93 1.87 4.5 1.87 1.87 4.5 4.5 1.8 ...
+##  $ Mgo                   : num  2.64 2.21 19.24 12.8 0.73 ...
+##  $ Al203                 : num  7.56 6.97 2.42 2.36 11.6 2.36 2.36 11.6 11.6 1.7 ...
+##  $ Si02                  : num  38.3 43.8 39.4 40.3 57.1 ...
+##  $ P205                  : num  1.65 2.76 0.48 0.28 0.84 0.28 0.28 0.84 0.84 0.1 ...
+##  $ S03                   : num  2.69 3.21 0.78 1.66 1 1.66 1.66 1 1 2.6 ...
+##  $ Cl                    : num  3.4 1.48 0.66 0.94 2.08 0.94 0.94 2.08 2.08 4.5 ...
+##  $ K20                   : num  0.75 1.06 0.18 0.2 1.9 0.2 0.2 1.9 1.9 0.3 ...
+##  $ Cao                   : num  7.77 7.62 2.94 2.94 4.31 2.94 2.94 4.31 4.31 1.8 ...
+##  $ Ti02                  : num  1.47 2.49 0.37 0.99 0.59 0.99 0.99 0.59 0.59 0.2 ...
+##  $ Cr203                 : num  0.03 0.01 0.26 0.29 0 0.29 0.29 0 0 0.2 ...
+##  $ Mno                   : num  0.46 0.44 0.69 0.58 0.28 0.58 0.58 0.28 0.28 0.4 ...
+##  $ FeO-T                 : num  18.7 23.2 30.1 25.7 13.2 ...
+##  $ feldspar              : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ plagioclase           : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ pyroxene              : num  1 1 1 1 1 1 1 1 1 1 ...
+##  $ olivine               : num  0 0 0 1 1 1 1 1 1 1 ...
+##  $ quartz                : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ apatite               : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ FeTi_Oxides           : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Iron_Oxide            : num  1 1 1 0 0 0 0 1 1 1 ...
+##  $ Sulfate               : num  1 1 1 1 1 1 1 0 0 0 ...
+##  $ Perchlorates          : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Phosphate             : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Ca_Sulfate            : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Carbonate             : num  0 1 1 1 1 1 1 1 1 1 ...
+##  $ Fe_Mg_clay            : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Fe_Mg_carbonate       : num  0 0 0 0 0 0 0 0 0 1 ...
+##  $ Mg_sulfate            : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Phyllosilicates       : num  0 0 0 0 0 0 0 1 1 1 ...
+##  $ Chlorite              : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Halite                : num  1 0 0 0 0 0 0 0 0 1 ...
+##  $ Organic_matter        : num  0 0 0 1 1 1 1 1 1 0 ...
+##  $ Hydrated_Ca_Sulfate   : num  0 1 1 0 0 0 0 0 0 0 ...
+##  $ Hydrated_Sulfates     : num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Hydrated_Mg_Fe_Sulfate: num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Na_Perchlorate        : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Amorphous_Silicate    : num  1 1 1 1 1 1 1 0 0 0 ...
+##  $ Hydrated_Carbonates   : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Disordered_Silicates  : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Hydrated_Iron_Oxide   : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Sulfate+Organic_Matter: num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Other_hydrated_phases : num  0 0 0 1 1 1 1 1 1 1 ...
+##  $ Kaolinite             : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Chromite              : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Ilmenite              : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Zircon/Baddeleyite    : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Spinels               : num  0 0 0 0 0 0 0 0 0 0 ...
+
# Use our chosen 'k' to perform k-means clustering
+set.seed(2)
+k <- 3
+# Load necessary libraries
+
+# Use your dataset (replace `data` with your actual dataframe)
+data <- pixl_lithology.matrix # Example dataframe
+
+# Load necessary libraries
+library(ggplot2)
+library(ggrepel)
+
+# Ensure data is numeric
+data_numeric <- data[, sapply(data, is.numeric)]
+
+# Check for zero-variance columns
+zero_var_cols <- sapply(data_numeric, function(x) var(x) == 0)
+
+# Remove zero-variance columns
+if (any(zero_var_cols)) {
+  data_filtered <- data_numeric[, !zero_var_cols]
+} else {
+  data_filtered <- data_numeric
+}
+
+# Perform PCA without scaling on filtered data
+pca_result <- prcomp(data_filtered, center = TRUE, scale. = FALSE)
+
+# Calculate explained variance for each principal component
+explained_variance <- (pca_result$sdev^2) / sum(pca_result$sdev^2)
+
+# Format explained variance for axis labels
+pc1_var <- round(explained_variance[1] * 100, 2)
+pc2_var <- round(explained_variance[2] * 100, 2)
+
+# Extract scores (the principal component values for observations)
+scores <- as.data.frame(pca_result$x)
+
+wssplot <- function(data, nc=15, seed=10){
+  wss <- data.frame(cluster=1:nc, quality=c(0))
+  for (i in 1:nc){
+    set.seed(seed)
+    wss[i,2] <- kmeans(data, centers=i)$tot.withinss}
+  ggplot(data=wss,aes(x=cluster,y=quality)) + 
+    geom_line() + 
+    ggtitle("Quality of k-means by Cluster")
+}
+
+# Apply `wssplot()` to our PIXL data
+wssplot(data_numeric, nc=8, seed=2) 
+

+
# Choose number of clusters
+k <- 4  # Change this number to the number of clusters you want
+
+# Perform k-means clustering on PCA scores
+set.seed(2)  # Ensure reproducibility
+kmeans_result <- kmeans(scores[, 1:2], centers = k)  # Clustering on the first two PCs
+
+# Add the cluster labels to the scores dataframe
+scores$cluster <- as.factor(kmeans_result$cluster)
+
+# Extract loadings (the contribution of each variable to the components)
+loadings <- as.data.frame(pca_result$rotation)
+loadings <- loadings * 20  # Scale the loadings for better visibility
+
+# Create a biplot using ggplot2, include the explained variance and cluster labels
+ggplot() +
+  # Plot the scores (PCs for observations) and color by cluster
+  geom_point(aes(x = scores$PC1, y = scores$PC2, color = scores$cluster), size = 3) +
+  
+  # Add labels for observations with cluster information
+  geom_text_repel(aes(x = scores$PC1, y = scores$PC2, label = paste(rownames(scores), "-", scores$cluster)), color = "black") +
+  
+  # Plot the loadings (arrows for variables)
+  geom_segment(aes(x = 0, y = 0, xend = loadings$PC1, yend = loadings$PC2),
+               arrow = arrow(length = unit(0.3, "cm")), color = "red") +
+  
+  # Add labels for variables
+  geom_text_repel(aes(x = loadings$PC1, y = loadings$PC2, label = rownames(loadings)), color = "red") +
+  
+  # Add titles and labels, including explained variance for PC1 and PC2
+  labs(x = paste0("PC1 (", pc1_var, "% Variance Explained)"),
+       y = paste0("PC2 (", pc2_var, "% Variance Explained)"),
+       title = paste("PCA Biplot with", k, "Clusters (Unscaled Data)")) +
+  
+  # Add a theme for better appearance
+  theme_minimal() +
+  
+  # Set color scale for clusters
+  scale_color_manual(values = rainbow(k))  # Use `rainbow(k)` for distinct cluster colors
+
## Warning: ggrepel: 42 unlabeled data points (too many overlaps). Consider
+## increasing max.overlaps
+

+
+
+

3.7 Data Set G: Sherloc + +Lithology

+

Create Data and matrix from prior datasets by taking on appropriate +matrix

+
# Combine the Lithology and SHERLOC dataframes
+sherloc_lithology.df <- cbind(sherloc.df,lithology.df )
+
+# Review what we have
+summary(sherloc_lithology.df)
+
##      sample               type           campaign         abrasion
+##  Min.   : 1.00   Igneous    :8   Crater Floor:9   Alfalfa     :2  
+##  1st Qu.: 4.75   N/A        :1   Delta Front :7   Bellegrade  :2  
+##  Median : 8.50   Sedimentary:7                    Berry Hollow:2  
+##  Mean   : 8.50                                    Dourbes     :2  
+##  3rd Qu.:12.25                                    Novarupta   :2  
+##  Max.   :16.00                                    Quartier    :2  
+##                                                   (Other)     :4  
+##          Name     Plagioclase        Sulfate         Ca-sulfate    
+##  Atsah     : 1   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
+##  Bearwallow: 1   1st Qu.:0.0000   1st Qu.:0.1875   1st Qu.:0.0000  
+##  Coulettes : 1   Median :0.0000   Median :1.0000   Median :0.0000  
+##  Hahonih   : 1   Mean   :0.1875   Mean   :0.6562   Mean   :0.3438  
+##  Hazeltop  : 1   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
+##  Kukaklek  : 1   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
+##  (Other)   :10                                                     
+##  Hydrated Ca-sulfate   Mg-sulfate     Hydrated Sulfates Hydrated Mg-Fe sulfate
+##  Min.   :0.000       Min.   :0.0000   Min.   :0.000     Min.   :0.0000        
+##  1st Qu.:0.000       1st Qu.:0.0000   1st Qu.:0.000     1st Qu.:0.0000        
+##  Median :0.000       Median :0.0000   Median :0.000     Median :0.0000        
+##  Mean   :0.125       Mean   :0.1875   Mean   :0.125     Mean   :0.1875        
+##  3rd Qu.:0.000       3rd Qu.:0.0000   3rd Qu.:0.000     3rd Qu.:0.0000        
+##  Max.   :1.000       Max.   :1.0000   Max.   :1.000     Max.   :1.0000        
+##                                                                               
+##   Perchlorates    Na-perchlorate    Amorphous Silicate   Phosphate     
+##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000     Min.   :0.0000  
+##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000     1st Qu.:0.0000  
+##  Median :0.0000   Median :0.00000   Median :0.0000     Median :0.0000  
+##  Mean   :0.0625   Mean   :0.03125   Mean   :0.1406     Mean   :0.2031  
+##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.2500     3rd Qu.:0.3125  
+##  Max.   :1.0000   Max.   :0.50000   Max.   :0.5000     Max.   :1.0000  
+##                                                                        
+##     Pyroxene         Olivine         Carbonate      Fe-Mg carbonate
+##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
+##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.4375   1st Qu.:0.000  
+##  Median :1.0000   Median :0.6250   Median :1.0000   Median :0.000  
+##  Mean   :0.6875   Mean   :0.5312   Mean   :0.7344   Mean   :0.125  
+##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.000  
+##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
+##                                                                    
+##  Hydrated Carbonates Disordered Silicates    Feldspar         Quartz       
+##  Min.   :0           Min.   :0.000        Min.   :0.000   Min.   :0.00000  
+##  1st Qu.:0           1st Qu.:0.000        1st Qu.:0.000   1st Qu.:0.00000  
+##  Median :0           Median :0.000        Median :0.000   Median :0.00000  
+##  Mean   :0           Mean   :0.125        Mean   :0.125   Mean   :0.03125  
+##  3rd Qu.:0           3rd Qu.:0.000        3rd Qu.:0.000   3rd Qu.:0.00000  
+##  Max.   :0           Max.   :1.000        Max.   :1.000   Max.   :0.25000  
+##                                                                            
+##     Apatite        FeTi oxides         Halite          Iron oxide    
+##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
+##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
+##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.0000  
+##  Mean   :0.1406   Mean   :0.1406   Mean   :0.04688   Mean   :0.2812  
+##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.5000  
+##  Max.   :1.0000   Max.   :1.0000   Max.   :0.25000   Max.   :1.0000  
+##                                                                      
+##  Hydrated Iron oxide Organic matter   Sulfate+Organic matter
+##  Min.   :0.00000     Min.   :0.0000   Min.   :0.0000        
+##  1st Qu.:0.00000     1st Qu.:0.0000   1st Qu.:0.0000        
+##  Median :0.00000     Median :1.0000   Median :0.0000        
+##  Mean   :0.01562     Mean   :0.5938   Mean   :0.2188        
+##  3rd Qu.:0.00000     3rd Qu.:1.0000   3rd Qu.:0.2500        
+##  Max.   :0.25000     Max.   :1.0000   Max.   :1.0000        
+##                                                             
+##  Other hydrated phases Phyllosilicates      Chlorite     
+##  Min.   :0.0000        Min.   :0.00000   Min.   :0.0000  
+##  1st Qu.:0.0000        1st Qu.:0.00000   1st Qu.:0.0000  
+##  Median :0.2500        Median :0.00000   Median :0.0000  
+##  Mean   :0.4375        Mean   :0.09375   Mean   :0.0625  
+##  3rd Qu.:1.0000        3rd Qu.:0.06250   3rd Qu.:0.0000  
+##  Max.   :1.0000        Max.   :0.50000   Max.   :0.5000  
+##                                                          
+##  Kaolinite (hydrous Al-clay)    Chromite        Ilmenite     Zircon/Baddeleyite
+##  Min.   :0.0000              Min.   :0.000   Min.   :0.000   Min.   :0.000     
+##  1st Qu.:0.0000              1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000     
+##  Median :0.0000              Median :0.000   Median :0.000   Median :0.000     
+##  Mean   :0.1875              Mean   :0.125   Mean   :0.125   Mean   :0.125     
+##  3rd Qu.:0.0000              3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:0.000     
+##  Max.   :1.0000              Max.   :1.000   Max.   :1.000   Max.   :1.000     
+##                                                                                
+##  Fe-Mg-clay minerals    Spinels           sample              name   
+##  Min.   :0.0000      Min.   :0.0000   Min.   : 1.00   Atsah     : 1  
+##  1st Qu.:0.0000      1st Qu.:0.0000   1st Qu.: 4.75   Bearwallow: 1  
+##  Median :0.0000      Median :0.0000   Median : 8.50   Coulettes : 1  
+##  Mean   :0.1875      Mean   :0.0625   Mean   : 8.50   Hahonih   : 1  
+##  3rd Qu.:0.0000      3rd Qu.:0.0000   3rd Qu.:12.25   Hazeltop  : 1  
+##  Max.   :1.0000      Max.   :0.5000   Max.   :16.00   Kukaklek  : 1  
+##                                                       (Other)   :10  
+##        SampleType         campaign         abrasion feldspar plagioclase
+##  atmospheric: 1   Crater Floor:9   Alfalfa     :2   0:14     0:13       
+##  regolith   : 0   Delta Front :7   Bellegarde  :2   1: 2     1: 3       
+##  rock core  :15   Margin Unit :0   Berry Hollow:2                       
+##                                    Dourbes     :2                       
+##                                    Novarupta   :2                       
+##                                    Quartier    :2                       
+##                                    (Other)     :4                       
+##  pyroxene olivine quartz apatite FeTi_Oxides Iron_Oxide Sulfate Perchlorates
+##  0: 5     0: 6    0:14   0:13    0:13        0:9        0: 4    0:15        
+##  1:11     1:10    1: 2   1: 3    1: 3        1:7        1:12    1: 1        
+##                                                                             
+##                                                                             
+##                                                                             
+##                                                                             
+##                                                                             
+##  Phosphate Ca_Sulfate Carbonate Fe_Mg_clay Fe_Mg_carbonate Mg_sulfate
+##  0:11      0:10       0: 1      0:13       0:14            0:13      
+##  1: 5      1: 6       1:15      1: 3       1: 2            1: 3      
+##                                                                      
+##                                                                      
+##                                                                      
+##                                                                      
+##                                                                      
+##  Phyllosilicates Chlorite Halite Organic_matter Hydrated_Ca_Sulfate
+##  0:12            0:14     0:13   0: 5           0:14               
+##  1: 4            1: 2     1: 3   1:11           1: 2               
+##                                                                    
+##                                                                    
+##                                                                    
+##                                                                    
+##                                                                    
+##  Hydrated_Sulfates Hydrated_Mg_Fe_Sulfate Na_Perchlorate Amorphous_Silicate
+##  0:14              0:13                   0:15           0:9               
+##  1: 2              1: 3                   1: 1           1:7               
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##                                                                            
+##  Hydrated_Carbonates Disordered_Silicates Hydrated_Iron_Oxide
+##  0:16                0:14                 0:15               
+##                      1: 2                 1: 1               
+##                                                              
+##                                                              
+##                                                              
+##                                                              
+##                                                              
+##  Sulfate+Organic_Matter Other_hydrated_phases Kaolinite Chromite Ilmenite
+##  0:11                   0:8                   0:13      0:14     0:14    
+##  1: 5                   1:8                   1: 3      1: 2     1: 2    
+##                                                                          
+##                                                                          
+##                                                                          
+##                                                                          
+##                                                                          
+##  Zircon/Baddeleyite Spinels
+##  0:14               0:14   
+##  1: 2               1: 2   
+##                            
+##                            
+##                            
+##                            
+## 
+
# Combine the Lithology and SHERLOC matrices
+sherloc_lithology.matrix<-cbind(sherloc.matrix,lithology.matrix)
+
+# Review the resulting matrix
+str(sherloc_lithology.matrix)
+
## 'data.frame':    16 obs. of  70 variables:
+##  $ Plagioclase                : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Sulfate                    : num  1 1 1 1 1 1 1 0 0 0 ...
+##  $ Ca-sulfate                 : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Hydrated Ca-sulfate        : num  0 1 1 0 0 0 0 0 0 0 ...
+##  $ Mg-sulfate                 : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Hydrated Sulfates          : num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Hydrated Mg-Fe sulfate     : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Perchlorates               : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Na-perchlorate             : num  0.5 0 0 0 0 0 0 0 0 0 ...
+##  $ Amorphous Silicate         : num  0.25 0.25 0.25 0.5 0.5 0.25 0.25 0 0 0 ...
+##  $ Phosphate                  : num  0.25 1 1 0 0 0 0 0 0 0 ...
+##  $ Pyroxene                   : num  1 1 1 1 1 1 1 1 1 1 ...
+##  $ Olivine                    : num  0 0 0 1 1 1 1 0.25 0.25 1 ...
+##  $ Carbonate                  : num  0 1 1 1 1 1 1 0.5 0.5 1 ...
+##  $ Fe-Mg carbonate            : num  0 0 0 0 0 0 0 0 0 1 ...
+##  $ Hydrated Carbonates        : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Disordered Silicates       : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Feldspar                   : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Quartz                     : num  0 0 0 0 0 0 0 0.25 0.25 0 ...
+##  $ Apatite                    : num  0.25 0 0 0 0 0 0 0 0 0 ...
+##  $ FeTi oxides                : num  0.25 1 1 0 0 0 0 0 0 0 ...
+##  $ Halite                     : num  0.25 0 0 0 0 0 0 0 0 0.25 ...
+##  $ Iron oxide                 : num  1 1 1 0 0 0 0 0.5 0.5 0.25 ...
+##  $ Hydrated Iron oxide        : num  0.25 0 0 0 0 0 0 0 0 0 ...
+##  $ Organic matter             : num  0 0 0 1 1 1 1 1 1 0 ...
+##  $ Sulfate+Organic matter     : num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Other hydrated phases      : num  0 0 0 1 1 1 1 0.5 0.5 1 ...
+##  $ Phyllosilicates            : num  0 0 0 0 0 0 0 0.5 0.5 0.25 ...
+##  $ Chlorite                   : num  0 0 0 0 0 0 0 0.5 0.5 0 ...
+##  $ Kaolinite (hydrous Al-clay): num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Chromite                   : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Ilmenite                   : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Zircon/Baddeleyite         : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Fe-Mg-clay minerals        : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Spinels                    : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ feldspar                   : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ plagioclase                : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ pyroxene                   : num  1 1 1 1 1 1 1 1 1 1 ...
+##  $ olivine                    : num  0 0 0 1 1 1 1 1 1 1 ...
+##  $ quartz                     : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ apatite                    : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ FeTi_Oxides                : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Iron_Oxide                 : num  1 1 1 0 0 0 0 1 1 1 ...
+##  $ Sulfate                    : num  1 1 1 1 1 1 1 0 0 0 ...
+##  $ Perchlorates               : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Phosphate                  : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Ca_Sulfate                 : num  1 1 1 0 0 0 0 0 0 0 ...
+##  $ Carbonate                  : num  0 1 1 1 1 1 1 1 1 1 ...
+##  $ Fe_Mg_clay                 : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Fe_Mg_carbonate            : num  0 0 0 0 0 0 0 0 0 1 ...
+##  $ Mg_sulfate                 : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Phyllosilicates            : num  0 0 0 0 0 0 0 1 1 1 ...
+##  $ Chlorite                   : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Halite                     : num  1 0 0 0 0 0 0 0 0 1 ...
+##  $ Organic_matter             : num  0 0 0 1 1 1 1 1 1 0 ...
+##  $ Hydrated_Ca_Sulfate        : num  0 1 1 0 0 0 0 0 0 0 ...
+##  $ Hydrated_Sulfates          : num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Hydrated_Mg_Fe_Sulfate     : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Na_Perchlorate             : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Amorphous_Silicate         : num  1 1 1 1 1 1 1 0 0 0 ...
+##  $ Hydrated_Carbonates        : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Disordered_Silicates       : num  0 0 0 0 0 0 0 1 1 0 ...
+##  $ Hydrated_Iron_Oxide        : num  1 0 0 0 0 0 0 0 0 0 ...
+##  $ Sulfate+Organic_Matter     : num  0 0 0 0 0 1 1 0 0 0 ...
+##  $ Other_hydrated_phases      : num  0 0 0 1 1 1 1 1 1 1 ...
+##  $ Kaolinite                  : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Chromite                   : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Ilmenite                   : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Zircon/Baddeleyite         : num  0 0 0 0 0 0 0 0 0 0 ...
+##  $ Spinels                    : num  0 0 0 0 0 0 0 0 0 0 ...
+
+
+
+

4 Analysis of Data (Part +3)

+

Each team has been assigned one of six datasets:

+
    +
  1. Dataset B: PIXL: The PIXL team’s goal is to understand and +explain how scaling improves results from Assignment 1

  2. +
  3. Dataset C: LIBS (with appropriate scaling as necessary)

  4. +
  5. Dataset D: Sherloc (with appropriate scaling as +necessary)

  6. +
  7. Dataset E: PIXL + Sherloc (with appropriate scaling as +necessary)

  8. +
  9. Dataset F: PIXL + Lithography (with appropriate scaling as +necessary)

  10. +
  11. Dataset G: Sherloc + Lithograpy (with appropriate scaling as +necessary)

  12. +
+

For each data set perform the following steps. Feel +free to use the methods/code from Assignment 1 as desired. Communicate +with your teammates. Make sure that you are doing different variations +of below analysis so that no team member does the exact same analysis. +If you want to share clustering (which is okay but then vary rest), make +sure you use the same random seeds.

+

I am assigned to do an analysis on dataset F.

+
    +
  1. Describe the data set contained in the data frame and +matrix: How many rows does it have and how many features? Which +features are measurements and which features are metadata about the +samples? (3 pts)
  2. +
+

The data set has 16 samples and 59 features. There are 48 +measurements, which are comprised of chemical compounds and minerals. +The metadata features are the following: (sample, name, type, campaign, +location, abrasion, SampleType)

+
    +
  1. Scale this data appropriately (you can choose the scaling +method): Explain why you chose that scaling method. (3 pts)
  2. +
+

I scaled the Pixl data, but left the lithography data unscaled. I +chose this method because showed the best variations in the data.

+
    +
  1. Cluster the data using k-means or your favorite clustering +method (like hierarchical clustering): Describe how you picked the +best number of clusters. Indicate the number of points in each clusters. +Coordinate with your team so you try different approaches. If you want +to share results with your team mates, make sure to use the same random +seeds. (6 pts)
  2. +
+

I chose the number of clusters based on the elbow test using WSS. I +was between 4 and 6 clusters, but I ultimately decided on 4 because it +explained the data a little better.

+
    +
  1. Perform a creative analysis that provides +insights into what one or more of the clusters are and what they tell +you about the MARS data:
  2. +
+

Cluster 4 of my data was very interesting as represented by my PCA +analysis. Here we have data with high concentrations of Manganese oxide +and Iron Oxide, but also has high concentrations of Aluminium Trioxide +and Silicone Dioxide. This likely indicates that this region of samples +are taken in a transition zone, which are comprised of a volcanic region +and a waterbasin type area. This is an interesting place to take samples +as it shows many different times of geological processes.

+
+
+

5 Preparation of Team +Presentation (Part 4)

+

Prepare a presentation of your teams result to present in class on +September 11 starting at 9am in AE217 (20 pts) The +presentation should include the following elements

+
    +
  1. A Description of the data set that you analyzed +including how many observations and how many features. (<= 1.5 +mins)
  2. +
  3. Each team member gets three minutes to explain +their analysis:
  4. +
+
    +
  • what analysis they performed
  • +
  • the results of that analysis
  • +
  • a brief discussion of their interpretation of these results
  • +
  • <= 18 mins total!
  • +
+
    +
  1. A Conclusion slide indicating major findings of the +teams (<= 1.5 mins)
  2. +
  3. Thoughts on potential next steps for the MARS team +(<= 1.5 mins)
  4. +
+ +

https://docs.google.com/document/d/1-4o1O4h2r8aMjAplmE-ItblQnyDAKZwNs5XCnmwacjs/pub

+
+
+

6 When you’re done: SAVE, +COMMIT and PUSH YOUR CHANGES!

+

When you are satisfied with your edits and your notebook knits +successfully, remember to push your changes to the repo using the +following steps:

+
    +
  • git branch +
      +
    • To double-check that you are in your working branch
    • +
  • +
  • git add <your changed files>
  • +
  • git commit -m "Some useful comments"
  • +
  • git push origin <your branch name>
  • +
+
+
+

7 Prepare group +presentation

+

Prepare a (at most) three-slide presentation of your +classification results and creative analysis. Create a joint +presentation with your teammates using the Google Slides template +available here: https://bit.ly/45twtUP (copy the template and customize +with your content)

+

Prepare a conclusion slide that summarizes all your results.

+

Be prepared to present your results on xx Sep 2024 in class!

+
+
+

8 APPENDIX: Accessing +RStudio Server on the IDEA Cluster

+

The IDEA Cluster provides seven compute nodes (4x 48 cores, 3x 80 +cores, 1x storage server)

+
    +
  • The Cluster requires RCS credentials, enabled via registration in +class +
      +
    • email John Erickson for problems erickj4@rpi.edu
    • +
  • +
  • RStudio, Jupyter, MATLAB, GPUs (on two nodes); lots of storage and +computes
  • +
  • Access via RPI physical network or VPN only
  • +
+
+
+

9 More info about Rstudio +on our Cluster

+
+

9.1 RStudio GUI +Access:

+ +
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.pdf b/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.pdf new file mode 100644 index 0000000..dc78519 Binary files /dev/null and b/StudentNotebooks/Assignment02/dar-f24-assignment2-balajy.pdf differ