Skip to content

Dar roberd10, Notebook 5 + Addition to v1 #169

Merged
merged 5 commits into from Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions StudentData/v1_Data_Introduction.md
Expand Up @@ -12,6 +12,8 @@ There is both meta data and feature data included in LIBS, since their are no ot
- *Sol*: Numeric, integers > 0. The Mars day (since start of mission) that the rover took the LIBS point was taken.
- *Lat*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken.
- *Lon*: Numeric. Part of the location data. Where the *rover* was when the LIBS point was taken.
- *Type*: Character. What is it a scan of? (Earth calibration, Sample, other?)
- *Earth_Sample*: Binary, indicating whether the target is an earth sample (1) or not an earth sample (0).

**Feature data** (numeric) is the same as PIXL, concentration of elemental compounds, though without a few of the elemental compounds PIXL includes.

Expand Down
7 changes: 5 additions & 2 deletions StudentData/v1_consistent_data_naming.Rmd
Expand Up @@ -21,6 +21,7 @@ pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds")

# Importing LIBS
libs.df <- readRDS("~/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds")
libs_type.df <- readRDS("~/DAR-Mars-F24/StudentData/libs_typed.Rds")

# Importing Lithology
lithology.df<- readRDS("~/DAR-Mars-F24/Data/mineral_data_static.Rds")
Expand Down Expand Up @@ -52,9 +53,11 @@ colnames(pixl.df) <- c("Lat","Lon","Sol","Sample",
"Na2O","MgO","Al2O3","SiO2","P2O5","SO3","Cl","K2O","CaO","TiO2","Cr2O3","MnO","FeOT",
"Name","Type","Campaign","Location","Abrasion")
# Renaming LIBS
libs.df <- cbind(libs.df,libs_type.df$"type",libs_type.df$"earthsample?")
colnames(libs.df) <- c("Sol","Lat","Lon","Target","Point",
"SiO2","SiO2_stdev","TiO2","TiO2_stdev","Al2O3","Al2O3-stdev","FeOT","FeOT_stdev","MgO","MgO_stdev","CaO","CaO_stdev","Na2O","Na2O_stdev","K2O","K2O_stdev",
"Total","distance_mm","Tot.Em.")
"Total","distance_mm","Tot.Em.",
"Type","Earth_Sample")
# Renaming Lithology
colnames(lithology.df) <- c("Sample","Name","SampleType","Campaign","Abrasion",
"Feldspar","Plagioclase","Pyroxene","Olivine","Quartz",
Expand Down Expand Up @@ -100,7 +103,7 @@ pixl.df <- pixl.df[,c("Sample",
"P2O5","SO3","Cl","Cr2O3","MnO" #These ones don't show up in LIBS
)]
# Resorting LIBS columns
libs.df <- libs.df[,c("Target","Point","Sol","Lat","Lon",
libs.df <- libs.df[,c("Target","Point","Sol","Lat","Lon","Type","Earth_Sample",
"SiO2","SiO2_stdev","TiO2","TiO2_stdev","Al2O3","Al2O3-stdev","FeOT","FeOT_stdev","MgO","MgO_stdev","CaO","CaO_stdev","Na2O","Na2O_stdev","K2O","K2O_stdev",
"Total","distance_mm","Tot.Em.")]
# Resorting Lithology columns
Expand Down
Binary file modified StudentData/v1_libs.Rds
Binary file not shown.
258 changes: 258 additions & 0 deletions StudentNotebooks/Assignment05/roberd10_assignment05.Rmd
@@ -0,0 +1,258 @@
---
title: "DAR F24 Assignment 5 Notebook"
author: "Doña Roberts"
date: "`r Sys.Date()`"
output:
pdf_document:
toc: yes
html_document:
toc: yes
subtitle: "MARS"
---
```{r setup, include=FALSE}
# Required R package installation; RUN THIS BLOCK BEFORE ATTEMPTING TO KNIT THIS NOTEBOOK!!!
# This section install packages if they are not already installed.
# This block will not be shown in the knit file.
knitr::opts_chunk$set(echo = TRUE)
# Set the default CRAN repository
local({r <- getOption("repos")
r["CRAN"] <- "http://cran.r-project.org"
options(repos=r)
})
if (!require("pandoc")) {
install.packages("pandoc")
library(pandoc)
}
# Required packages for M20 LIBS analysis
if (!require("rmarkdown")) {
install.packages("rmarkdown")
library(rmarkdown)
}
if (!require("tidyverse")) {
install.packages("tidyverse")
library(tidyverse)
}
if (!require("stringr")) {
install.packages("stringr")
library(stringr)
}
if (!require("ggbiplot")) {
install.packages("ggbiplot")
library(ggbiplot)
}
if (!require("pheatmap")) {
install.packages("pheatmap")
library(pheatmap)
}
if (!require("knitr")) {
install.packages("knitr")
library(knitr)
}
```

## Weekly Work Summary

* RCS ID: Roberd10
* Project Name: MARS
* Summary of work since last week

* Creating an Rmd file that creates an Rds file containing the minerals and various aspects of them
* Turns out David made a different version of this with more information. Once he's merged that, we should use it instead.
* Creating an Rmd file that compares samples that have been seperated by whether or not a type of mineral is present
* Creating an Rmd file that creates Rds for PIXL, LIBS, LITHOLOGY, & SHERLOC with a consistent naming scheme
* Starting to create a wireframe

* Summary of github issues added and worked

* Make the data frames have consistent names have the same naming format (#138)
* Issue created and completed by me
* Forgot to make issues for the rest...

* Summary of github commits

* roberd10
* Student Data merge 1
* v1_consistent_data_naming.Rmd : "Created file to make Rds with consistent naming schemes" (10/16/24)
* v1_libs.Rds : "Libs data: reordered and renamed" (10/16/24)
* v1_lithology.Rds : "Lithology data: reordered, renamed, and without meta data" (10/16/24)
* v1_pixl.Rds : "Pixl data: reordered, renamed, and without meta data" (10/16/24)
* v1_sample_meta.Rds : "Samples meta data Rds created" (10/16/24)
* v1_sherloc.Rds : "Sherloc data: reordered, renamed, and without meta data" (10/16/24)
* README.md : "Added descriptions of v1 rds files" (10/16/24)
* v1_Data_Introduction.md : "Created a markdown that explains what the v1 Rds files are" (10/17/24)
* Student Data merge 2
* v1_consistent_data_naming.Rmd : "Fixed columns classes" (10/22/24)
* v1_libs.Rds : "Fixed columns classes" (10/22/24)
* v1_lithology.Rds : "Fixed columns classes" (10/22/24)
* v1_pixl.Rds : "Fixed columns classes" (10/22/24)
* v1_sample_meta.Rds : "Fixed columns classes" (10/22/24)
* v1_sherloc.Rds : "Fixed columns classes" (10/22/24)
* v1_Data_Introduction.md : "Fixed columns classes" (10/22/24)
* Student Data merge 3
* README.md : "Fixed error: v1 was written as v2" (10/23/24)
* v1_consistent_data_naming.Rmd : "Added pixl and libs connection data frame to v1" (10/23/24)
* v1_Data_Introduction.md : "Added description of libs_to_pixl.Rds" (10/23/24)
* v1_libs_to_sample.Rds : "Created v1 for PIXL_LIBS_Combined.Rds" (10/23/24)
* v1_consistent_data_naming.Rmd : "Fixed error where lithology sample numbers were shuffled" (10/23/24)
* v1_lithology.Rds : "Fixed error where lithology sample numbers were shuffled" (10/23/24)
* Mineral Classes merge 4
* mineral_classes.Rmd : "Creating Rmd that creates Rds containing minerals and their features" (11/5/24)
* v1_mineral_classes.Rds : "Creating Rds that contains minerals and their features" (11/5/24)
* README.md : "Adding description of mineral_classes" (11/5/24)
* Assignment 5 merge 5
* roberd10_assignment05.Rmd : "Assignment 5" (11/6/24)
* roberd10_assignment05.pdf : "Assignment 5" (11/6/24)

## Personal Contribution

* Creating an Rmd file that creates Rds for PIXL, LIBS, LITHOLOGY, & SHERLOC with a consistent naming scheme
* Creating an Rmd file that compares samples that have been selected based on type of minerals present
* Creating an Rmd file that creates Rds with Mineral names and classification
* Starting on a potential wireframe for an "about" section

## Analysis: Mineral Classes

### Question being asked

What Minerals are grouped together by various features (Carbonates, Oxides, Aqueous, etc)?
How would we analyze the samples based on this?

(All the code in this section is also in an Rmd file called "Mineral_Selection.Rmd" that I'll push to GitHub once I know what folder it should be in)

### Data Preparation

Created an Rmd file in [StudentData](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentData) ("mineral_classes.Rmd") that creates an Rds file ("v1_mineral_classes.Rds") with the mineral names and a selectable (and extendable!) list of their features.

The following two sections (Importing and Selecting) are taken from "Mineral_Selection.Rmd".

#### Importing
First, importing the relevant data. Note that we are importing the new dataset called "mineral_classes". This is the data connecting minerals based on various features. It turns out that David made his own version of this that included Aqueous information, so once he's pushed that we should replace my "mineral_classes.df" with his version (whatever it is named).

```{r}
# Importing
sample_meta.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_sample_meta.Rds")
sample_meta.df <- sample_meta.df[1:16,]
pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds")
pixl.df[,-1] <- as.data.frame(scale(pixl.df[,-1]))
lithology.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_lithology.Rds")
lithology.df <- lithology.df[1:16,]
mineral_classes.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_mineral_classes.Rds")
```

##### Selecting
In this section, I am selecting a feature to split the minerals by. In this case, I've chosen to do "Oxide" vs "Not Oxide". It is easy to change what it is being selected by just by changing what "Selector" is set as.
```{r}
# Select by
Selector <- "Oxide"
# Possible things to select by at this time:
# Ates: "Sulfate", "Perchlorate", "Silicate", "Phosphate", "Carbonate"
# Ites: "Apatite", "Halite", "Chlorite", "Kaolinite", "Ilmenite"
# Other: "Oxide"
```

Once I've decided *what* to select by, I actually *select* by it.
```{r}
minerals <- mineral_classes.df$Type == Selector
```

Now that I have the list of minerals that have this feature, I select only those columns in Lithology.

```{r}
minerals <- append(TRUE,minerals)
lithology.df <- lithology.df[,minerals]
```

With this limited version of Lithology, I sum the rows for each sample and take only the samples who have more than 0. That is to say, I take all the samples that have at least one mineral with this feature present. This is my "Present" cluster of samples. The rest of he samples are put in the "Absent" cluster.

```{r}
# Resulting Samples
lithology.df[,-1] <- lapply(lithology.df[,-1],as.character)
lithology.df[,-1] <- lapply(lithology.df[,-1],as.numeric)
quantity <- rowSums(lithology.df[,-1])
clustering <- as.integer(quantity > 0)
Presence <- recode_factor(as.factor(clustering), "0" = "Absent", "1" = "Present")
```

### Analysis: Methods and results

Below is the analysis I've already put in "Mineral_Selection.Rmd". I will be continuing to add analysis to this, this is just the bare bones level I've done so far.

#### Mapping
First of all, we'll map the samples based on their coordinates and color based on whether (in this example) Oxides are present or absent.

Note that the title has the variable "Selector" in it, so if you change what you are selecting by it changes the title.

Also note that the samples are labeled by their *Abrasion*, this is because samples at the same abrasion have identical locations and thus overlap. Luckily, at least when selecting by Oxide, samples in the same abrasion are clustered together. I will need to figure out what to do when that's not the case.

```{r}
map_title <- paste("Map of samples based on location and colored by presence of",Selector)
ggplot(data = sample_meta.df, aes(Lon,Lat)) +
ggtitle(label = map_title) +
geom_point(aes(colour = Presence)) +
geom_text(aes(label = Abrasion), nudge_x = 0.004, size = 2)
```
For Oxides there doesn't appear to be an obvious locational clustering, but that might be different when you select by other things.

#### Elemental Compound Concentration Difference

Next, we of course have to look at how the mean concentration of elemental compounds differ when minerals with our (in this example, Oxides) feature are present or absent.

First we calculate the mean elemental compound concentrations for each group. Then we plot it on a heatmap.
Note that we scaled the Pixl data by column when we imported it, this is because we just want to see how each individual elemental compounds concentration varies between samples with our selector and sites without, *not* the variation between concentration of elemental compounds at a sample.

```{r}
Present <- colMeans(pixl.df[clustering,-1])
Absent <- colMeans(pixl.df[-clustering,-1])
means <- rbind(Present,Absent)
heat_title <- paste("Mean concentrations when grouped by presence of",Selector)
pheatmap(means,cluster_rows = FALSE, cluster_cols = FALSE, main = heat_title)
```

For Oxides, it's clear that samples with Oxides present had much higher concentrations of Na2O and CaO and much lower concentrations of MgO.
then samples without Oxides.

I'm currently looking into whether there is an error with this heatmap, since no matter what I select by it looks exactly the same. It might be due to scaling, an error in my calculation, or no error at all.

The "Absent" row being all about "0" is also a little odd, though that might be because the scaling centered the data.

### Discussion of results

This is just the start of the analysis that could be done after splitting sample sites into those that contain a type of mineral and those that don't. The hope is to have some of the more in depth analysis done by the others included here as well, but I haven't gotten to that yet. Essentially, this is a work in progress.

## Fix column names of the data frames

Created an Rmd file in StudentData ("v1_consistent_data_naming.Rmd") that creates a new Rds file for the four data sets as well as creating an Rds file of the meta data for the sample data.

Created a markdown file ("v1_Data_Introduction.md") that describes what the new Rds files contain.

These Rds files are the same data as the original data sets, but reorganized, renamed, and properly classed (numeric/factor/character) so that they are consistent and accurate.

Multiple iterations pushed to github as I add the data sets made by Margo and Charlotte or fix issues brought up during discussion.

Link to [Github StudentData](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentData)

## Creating a Wireframe

### Question being asked

How should we display the information about what the Sample and target data's are?

### Wireframe

Created a possible wireframe for the description of the data. My attention was shifted to the presentation for Karen Rogers and making a way to compare samples grouped by presence of types of minerals so the wireframe is incomplete/half-done. Additionally, it was brought up that since the Mission Minder is for people who already have the background knowledge, providing it isn't a priority.

[*Wireframe Slides*](https://docs.google.com/presentation/d/1QVX61d2LxmP8Fj_M0L-ZLxQBjvYRjSS_rnh4Fkt6z2g/edit#slide=id.p)

## Summary and next steps

Most of my work was on data cleaning / organizing, and a little bit on the Mission Minder.

My attention will be shifting towards creating pages for the Mission Minder, as was instructed during recent meetings.
Binary file not shown.