diff --git a/StudentData/v1_consistent_data_naming.Rmd b/StudentData/v1_consistent_data_naming.Rmd index ba7f8e5..1aedf17 100644 --- a/StudentData/v1_consistent_data_naming.Rmd +++ b/StudentData/v1_consistent_data_naming.Rmd @@ -24,7 +24,7 @@ if (!require("qpcR")) { # Importing data frames ```{r} # Importing PIXL -#pixl.df <- readRDS("~/DAR-Mars-F24/Data/pixl.Rds") #Old PIXL, missing Lat and Lon +#pixl.df <- readRDS("~/DAR-Mars-F24/Data/samples_pixl_wide.Rds") #Old PIXL, missing Lat and Lon pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/pixl_sol_coordinates.Rds") # Importing LIBS diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.Rmd b/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.Rmd new file mode 100755 index 0000000..a3e81a5 --- /dev/null +++ b/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.Rmd @@ -0,0 +1,475 @@ +--- +title: "Mars Final Project Report" +author: "Doña Roberts" +date: "December 12th, 2024" +output: + html_document: + toc: yes + toc_depth: 3 + toc_float: yes + number_sections: yes + theme: united + html_notebook: default + pdf_document: + toc: yes + toc_depth: '3' +--- + + +# DAR Project and Group Members + +* Project name: MARS +* Project team members: + - Ashton Compton + - dar-compta + - Aadi Lahiri + - dar-lahira + - CJ Marino + - dar-marinc8 + - Nicolas Morawski + - dar-morawn + - Dante Mwatibo + - dar-mwatid + - Charlotte Peterson + - dar-peterc + - **Doña Roberts** + - [**dar-roberd10**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/dar-roberd10) + - Margo VanEsselstyn + - dar-vanesm + - David Walczyk + - dar-walczd3 + +# 0.0 Preliminaries. + +This report is generated from an R Markdown file that includes all the R code necessary to produce the results described and embedded in the report. Code blocks can be suppressed from output for readability. If `show <- FALSE` the code block will be suppressed; if `show <- TRUE` then the code will be shown. + +```{r} +# Set to TRUE to expand R code blocks; set to FALSE to collapse R code blocks +show <- TRUE +``` + +Executing this R notebook requires the following packages: + +* `ggplot2` +* `tidyverse` +* `pheatmap` +* `reshape2` + +These will be installed and loaded as necessary. + +```{r, include=FALSE} +# This code will install required packages if they are not already installed +if (!require("ggplot2")) { + install.packages("ggplot2") + library(ggplot2) +} +if (!require("tidyverse")) { + install.packages("tidyverse") + library(tidyverse) +} +if (!require("pheatmap")) { + install.packages("pheatmap") + library(pheatmap) +} +if (!require("reshape2")){ + install.packages("reshape2") + library(reshape2) +} +``` + +# 1.0 Project Introduction + +This report is for the Mars 2024 Data Analytics Research group. Our goal is to create an app ("Mission Minder") that shows unique dynamic analysis of the Mars Perseverance data. + +This particular report mainly focuses on how the data analyzed in "Mission Minder" is cleaned and organized. With a brief look at how the two types of features in sample data (mineral presence and elemental compound concentration) relate and why we won't be using Lithology going forward. + +# 2.0 Organization of Report + +This report is organized as follows: + +* Section 3.0. "Finding 1: Data Organization" + - Details of how we reformatted and cleaned the data for consistency and ease of use over multiple analysis methods. + - Notes on decisions made in the cleaning process, their consequences, and potential alternative choices. + - Instructions on how to integrate new samples/targets into the data sets. + +* Section 4.0: "Finding 2: Correlation Between Elemental Compounds and Minerals" + - Calculating the correlation between the two types of numeric features (PIXL & SHERLOC) of the sample data. + +* Section 5.0: "Lithology and SHERLOC match" + - Showing that Lithology is just the binary representation of SHERLOC. + +* Section 6.0 "Overall conclusions and suggestions" + +* Section 7.0 "Bibliography" + +* Section 8.0 "Appendix" + - Instructions on how to convert Lithology/SHERLOC data from factor to the accurate numeric. + + +# 3.0 Finding 1: Data Organization + +The original data for Perseverance was inconsistently organized and not always in a usable form. This section describes how the data was reorganized, re-classed, and relabeled to be more intuitive and to minimize the data manipulation needed for each individual analysis. + +## 3.1 Data, Code, and Resources + +Here is a list of data sets, codes, and resources that relate to this section. +Note that if you click on the file names it will send you to that file in the GitHub. + +a. Code altering the data sets + 1. [**v1_consistent_data_naming.Rmd**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_consistent_data_naming.Rmd) is the notebook that cleans the data and outputs all the v1 data sets. All the cleaning done to the code is described in the notebook and should be easily editable. + +b. Markdowns describing data sets + 1. [**v1_Data_Introduction.md**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_Data_Introduction.md) is the markdown file describing each of the new v1 data sets, what they contain, and how they have been changed. + 2. [**"Dataset-Description"**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/wiki/Dataset-Description) is the GitHub wiki page describing each of the v1 data sets. As of December 14th, 2024, this is more up to date than `v1_Data_Introuduction.md`. + +c. Data sets + 1. [**v1_sample_meta.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sample_meta.Rds) is the data set containing all the meta data for the physical samples Perseverance collected. + 2. [**v1_pixl.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds) is the data set containing all the elemental compound concentration data for the physical samples Perseverance collected. + 3. [**v1_sherloc.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sherloc.Rds) is the data set indicating confidence in mineral presence (NOT bindary) in the physical samples Perseverance collected. + 4. [**v1_lithology.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds) is the data set indicating whether minerals were present or not present (binary) in the physical samples Perseverance collected. + 5. [**v1_libs.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs.Rds) is the data set containing all the meta data and the elemental compound concentration data for the LIBS scans Perseverance did on Mars rock. + 6. [**v1_libs_to_sample.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs_to_sample.Rds) is the data set containing the nearest sample to each LIBS target/point and the distance between them. + 7. [**v1_libs_earth_references.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_libs_earth_references.Rds) is the data set containing all the meta data and the elemental compound concentration data for the LIBS scans Perseverance did on the Earth reference rocks it carries with it. That is, the "scct" targets. + +d. [**Final Notebooks**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook) + 1. [**peterc_finalProjectF24.Rmd**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/peterc_finalProjectF24.Rmd) is Charlotte's final notebook. Referenced in part 3.2, 3.3, and 3.5. + 2. [**vanesm_finalProjectF24.Rmd**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/vanesm_finalProjectF24.Rmd) is Margo's final notebook. Referenced in part 3.2, 3.3, and 3.5. + 3. [**walczd3_finalProjectF24.Rmd**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/tree/main/StudentNotebooks/Assignment08_FinalProjectNotebook/walczd3_finalProjectF24.Rmd) is David's final notebook. Referenced in part 3.2. + +## 3.2 Contribution + +The reordering and data cleaning was done by me, Doña Roberts. + +The data sets `v1_libs_to_sample.Rds`, the Type meta data for `v1_libs.Rds`, and the Sol, Lat, and Lon meta data for `v1_pixl.Rds` came from date sets created by Margo VanEsselstyn and Charlotte Peterson. +Their final notebooks, which document where they got the new data, can be found on the GitHub following links 3.1.d.1 and 3.1.d.2. + +Additionally, David created a data set, [**`aqueous.Rds`**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/dar-roberd10/StudentData/aqueous.Rds) that describes the details of each of the minerals in SHERLOC. +His final notebook, which documents how he created this data set, can also be found on GitHub following link 3.1.d.3. + +## 3.3 Data Description + +For the description of the resulting data sets, see the wiki page: [**https://github.rpi.edu/DataINCITE/DAR-Mars-F24/wiki/Dataset-Description**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/wiki/Dataset-Description) + +Since the wiki doesn't go into detail on what changes were made (just what the end result is) we'll summarize that now. Note that the actual file implementing these changes (with comments) is `v1_consistent_data_naming.Rmd` and can be found by on GitHub by following link 3.1.a.1. + +1) Imported data sets. These are the data I started with and then manipulated to create the v1s. + - [**pixl_sol_coordinates.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/dar-roberd10/StudentData/pixl_sol_coordinates.Rds), which is the PIXL data set including Sol/Lat/Lon created by Charlotte and Margo. See their final notebooks for how they created this file. + - [**supercam_libs_moc_loc.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/dar-roberd10/Data/supercam_libs_moc_loc.Rds), which is the LIBS data. This comes from a csv downloaded from NASA. + - [**libs_typed.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/dar-roberd10/StudentData/libs_typed.Rds), which is the LIBS data set including a column stating what type of scan it was. This was also created by Charlotte and Margo. See their final notebooks for how they created this file. + - [**mineral_data_static.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/dar-roberd10/Data/mineral_data_static.Rds), which is the Lithology data. This comes from the "Lithology" paragraph at the start of each sample's [sample report](https://science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/). + - [**abrasions_sherloc_samples.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/Data/abrasions_sherloc_samples.Rds), which is the SHERLOC data set that comes from the "SHERLOC" tables in the [sample reports](https://science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/). + - Some manipulation was done to this data set to make it's format match the others'. + - [**PIXL_LIBS_Combined.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/PIXL_LIBS_Combined.Rds), which is the data set listing each LIBS targets' nearest abrasion and the distance to it. This was created by Charlotte and Margo. See their final notebooks for how they created this file. + +2) Renamed Columns + - Capitalized all column names for consistency + - Changed PIXL elemental compound naming convention to match LIBS's + - Created a logical vector from the "earthsample?" column of `libs_typed.Rds` (to be used later in separating out the earth references) + - Changed Lithology & SHERLOC's mineral naming convention to be consistent between minerals and with each other + - Changed names of `PIXL_LIBS_Combined.Rds` (now called "libs_to_sample") so that it matches the naming convention of PIXL/LIBS and to make explicit whether a column came from PIXL or came from LIBS + +3) Separated out Sample Meta data + - See section 3.4's table of decisions, row 2, for details. + +4) Separated out LIBS earth reference data from rest of data + - See section 3.4's table of decisions, row 3, for details. + +5) Created the LIBS clusters used in most LIBS analysis and creating a column in the LIBS data frame indicating said clustering + - See section 3.4's table of decisions, row 6, for details. + +6) Reordered Columns + - Reordered LIBS columns such that the meta data is all at the front + - Reordered PIXL columns such that the elemental compounds also found in LIBS data are ordered the same, with elemental compound data unique to PIXL is at the end + - Reordered Lithology and Sherloc so that they match + - Reordered `libs_to_sample` columns such that LIBS meta data was first, distance between, PIXL meta data next, and LIBS elemental compound data last. + - See section 3.4's table of decisions, row 4, for details on the inclusion of the LIBS elemental compound data and all of the meta data. + +7) Changed data types + - New data types are listed in the Wiki linked at the start of this section. + - Several `character` types were changed to `factor` + - See section 3.4's table of decisions, row 1, for details on one such change. + - Sample Meta was originally all character, numbers were changed to `numeric` or `integer` and labels were changed to `factor` + +8) Saved as new Rds files + - All new Rds file names are proceeded with `v1_` with the intention that updates to these data sets will be saved with the prefix `vn_` where "n" denotes what addition they are. This way, if any errors occur in the updating process, the Mission Minder can still run on the older data sets while the new ones are fixed. + +## 3.4 Limitations + +While I attempted to organize the data in the most useful ways, there were some changes that could have been un-ideal, inconvenient, or had a valid alternative. + +Here is a table of the most notable ones: + +**Table of Decisions** + +| File: | Part in question: | Decision: | Pros: | Cons: | Alternative: | +|-----------|-------------------|-----------|------------|------------|--------------| +| `v1_sherloc` | The class of mineral presences | Setting the class as `factor` | Accurate to what it is | Inconvenient to use, see 8.1 for how to convert to numeric | 1) Leaving class as `character` 2) Setting class as `numeric` | +|_________|_______________|_________|__________|__________|______________| +| `v1_sample_ meta` | The existence of a separate meta data set | Separating meta data out | Cleaner, Neater, Less repetitive | One more data set to import | Repeating meta data in each sample data set | +|_________|_______________|_________|__________|__________|______________| +| `v1_libs_earth _references` | The existence of a separate earth reference data set | Separating `scct` (Earth reference) targets out | We aren't accidentally looking at Earth rocks when trying to analyze Mars ones | One more data set to import, harder to compare `scct` data to other types of libs data | Keeping `scct` targets in the `v1_libs` data set | +|_________|_______________|_________|__________|__________|______________| +| `v1_libs_to_ sample` | The inclusion of non-identifier PIXL and LIBS data | Include the extra meta and elemental compound data | Convenience. Change would be too late in process, you'd have to go through and change any code depending on the "fluff's" inclusion | Makes the data set more complex, more repetitive, and less clean | Removing non-identifier data from `v1_libs_to_sample` | +|_________|_______________|_________|__________|__________|______________| +| `v1_consistent _data_naming` | Line 153, the inclusion of all samples in `v1_lithology` | Limiting Lithology to only first 16 samples (by several requests) | Don't need to remove partially received samples for every analysis including PIXL data | **Problematic for when we receive new samples!!!** | 1) Don't limit it but people need to manually remove extra rows for analysis 2) Define a variable "n" at start of file, limit to "n" samples. Increment "n" every time we get a new sample | +|_________|_______________|_________|__________|__________|______________| +| `v1_libs` | The inclusion of a "Cluster" column | Include cluster column | For analysis using the 4 clusters, it's easier. Unifies clustering between people's analysis | Doesn't really make sense with the app where number of clusters is changeable and seed is already consistent between analysis | Don't include cluster column, instead rely on a "Cluster" variable in the app which is dependent on the number of clusters requested| + +There are many more decisions made while cleaning the data, but the above table details the most important (questionable) ones. + +## 3.5 Future Work + +In the future, we expect to receive more Sample and LIBS data from NASA. When this happens, there needs to be a way to integrate it. + +Below are the steps required, and the files to reference, in order to add and clean the new data. For some of the steps the code to complete it isn't public in the Data or StudentData GitHub folders, so the person responsible for that code or the place to find it detailed is identified. + +For new Sample data: + +1) Get the new data from NASA to update `mineral_data_static.Rds`, `abrasions_sherloc_samples.Rds`, and `sample_pixl_wide.Rds` (which is used in the creation of `pixl_sol_coordinates.Rds`) + - For updating `mineral_data_static.Rds`, look at the "Lithology" report at the start of each new sample's [Sample Report](https://science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/). As of December 14th 2024, the specific code to create the Rds file isn't public, so contact Dr. Erickson. + - For updating `abrasions_sherloc_samples.Rds`, look at the "SHERLOC" table in each new sample's [Sample Report](https://science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/). Talk to Karen Rogers for details. + - For updating `sample_pixl_wide.Rds`, as of December 14th 2024, there is no documentation available on GitHub explaining how this `Rds` file was created. Check with Dr. Erickson to see where this data came from and how to update it. + +2) Follow instructions in Margo and Charlotte's notebooks to update `pixl_sol_coordinates.Rds`. The Sol/Lat/Lon information they added comes from the [Analyst Notebook](https://an.rsl.wustl.edu/m20/AN/account/login.aspx). + +3) Follow instructions in Margo and Charlotte's notebooks to update `PIXL_LIBS_Combined.Rds` for the new LIBS points. + +4) Change line 153 of `v1_consistent_data_naming.Rmd` from `lithology.df <- lithology.df[1:16,~~~]` to `lithology.df <- lithology.df[1:n,~~~]` where `n` is the new number of samples. + +5) Run `v1_consistent_data_naming.Rmd` to update `v1_sample_meta.Rds`, `v1_pixl.Rds`, `v1_sherloc.Rds`, and `v1_libs_to_sample.Rds`. + - Note that since Lithology has become redundant, there is no need to update `v1_lithology.Rds`. + +For new LIBS data: + +1) Download new moc csv file from LIBS, and update `supercam_libs_moc_loc.Rds`. As of December 14th 2024, the specific code to create the Rds file isn't public, so contact Dr. Erickson. + +2) Follow instructions in Margo and Charlotte's notebooks to create `libs_typed.Rds` for the new LIBS points. + +3) Follow instructions in Margo and Charlotte's notebooks to update `PIXL_LIBS_Combined.Rds` for the new LIBS points. + +4) Run `v1_consistent_data_naming.Rmd` to update `v1_libs.Rds`, `v1_libs_earth_references.Rds`, and `v1_libs_to_sample.Rds`. + +Some recommendations for updating the data sets: + +- Instead of updating "v1" to contain the new data, create a copy of `v1_consistent_data_naming.Rmd` and replace all references to "v1" with "v2" (including the title of the file). This way, instead of updating the "v1" data sets you are creating new "v2" data sets. This means that if something goes wrong and it breaks, you can always fall back to the previous version while you figure out how to fix the break. + +- When creating the `v2_consistent_data_naming.Rmd`, go through and find Margo and Charlotte's code to update their data sets and add it to the start of the new `v2` code so that in the future there is only one file to run when updating. I was going to do this but, as they were still improving and updating it, the code wasn't stable enough for the transfer to make sense. + +# 4.0 Finding 2: Correlation Between Elemental Compounds and Minerals + +This section looks at how the concentration of elemental compounds (PIXL data) relates to/indicates the presence of minerals (SHERLOC data) at a sample location. + +We are looking at the question of how PIXL and SHERLOC relate to each other, and thus the [Exploration] --> [PIXL vs Sherloc] section of the Mission Minder. + +## 4.1 Data, Code, and Resources + +Here is a list of data sets, codes, and resources that are used in this work. Click on the file name to be sent to the file on GitHub. + +c. Data sets (code for importation is hidden) + 1. [**v1_pixl.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_pixl.Rds) + 2. [**v1_sherloc.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sherloc.Rds) with the exception of "Hydrated Carbonates" which is the same for all samples. + +First, we need to report the data sets. + +```{r, include=FALSE} +# Importing needed data sets +pixl.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_pixl.Rds") +sherloc.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_sherloc.Rds") +``` + +Then, before we can actually calculate the data we need, we must combine the two data sets into a matrix with only numeric data. + +```{r, include=show} +## Creating pixl matrix +# removing the sample number column +pixl.matrix <- as.matrix(pixl.df[,-1]) +## Creating sherloc matrix +# removing the sample number column +# converting from feature to numeric +sherloc.matrix <- as.matrix(as.data.frame(lapply(lapply( + sherloc.df[,-1],as.character),as.numeric))) +## Combining pixl and sherloc into one matrix +# removing the column(s) with no deviation, in this case "Hydrated Carbonates" +combined.matrix <- cbind(pixl.matrix,sherloc.matrix[,-16]) +``` + +With the data from the combined matrix, we calculate the pearson correlation of their combined data, and then look at only the correlation between features in PIXL and features in SHERLOC + +```{r, include=show} +# Calculating pearson feature correlation +combined.cormat <- round(cor(combined.matrix),2) +# Selecting only the correlations between pixl features and sherloc features +combined.cormat <- combined.cormat[colnames(pixl.matrix),colnames(sherloc.matrix[,-16])] +``` + +## 4.2 Contribution + +This section is solely my own work. + +## 4.3 Methods Description + +In order to get from raw SHERLOC and PIXL data to their correlation, I simply used the `cor` function from base r to calculate the correlation between each feature in PIXL (elemental compounds) and SHERLOC (minerals) and then selected only the correlations between features of PIXL and features of SHERLOC (instead of correlations within PIXL or within SHERLOC) to find my desired correlations. + +In the next section, I simply use `pheatmap` to hierarchaly cluster and output a heatmap of correlations. + +## 4.4 Result and Discussion + +```{r,include=show} +# Creating title for heatmap +heatmap.title <- "Correlation between Elemental Concentration & Mineral Presence" +# Printing heatmap of correlation +pheatmap(combined.cormat, + scale="none", + treeheight_row = 10,treeheight_col = 10, + main = heatmap.title) +``` +Caption: Vertical axis is PIXL elemental compounds, horizontal axis is SHERLOC minerals. Colored squares indicate the correlation between the elemental compound in that row and the mineral in that column. That is to say, if the square is red then when that elemental compound is present it is highly likely that the mineral is present and if that square is dark blue then when that elemental compound is present it is highly unlikely that the mineral is present. + +Here we can clearly see see which elemental compounds correlate to which minerals. + +For example, high SO3 concentration strongly correlates to the presence of Hydrated Mg-Fe sulfate, Mg-sulfate, Kaolinite, & Fe-Mg-clay minerals. + +- Here the sulfates make sense, as sulfate is SO4, or, O + __SO3__ +- Interestingly though, __SO3__ doesn't have a strong correlation with **Sulfate** itself. + +Similarly, high Cr2O3 concentration strongly correlates to the presence of Apatite, Spinels, Zircon/Baddeleyite, Chromite, & Ilmenite. + +- Here Chromite makes sense, since it's FeCr2O4, or, FeO + __Cr2O3__ + +Overall we see that; + +- MgO: Lightly suggests presence of Fe-Mg carbonate, *very* lightly correlated to a bunch of minerals but not strongly correlated to the presence of any of the minerals +- Cr2O3: **Strongly** suggests the presence of Apatite, Spinels, Zircon/Baddeleyite, Chromite, & Ilmenite. +- Cl: **Strongly** suggests the presence of Fe-Mg carbonate, & Halite. Very lightly correlated to a few other minerals. +- MnO: Mildly suggests the presence of Carbonate and lightly corelated to presence of Pyroxene. Very lightly correlated to a bunch of other minerals, but nothing strong. +- FeO-T: Mildly suggests the presence of Carbonate, and very lightly correlated to a bunch of minerals. +- SO3: **Strongly** suggests the presence of Hydrated Mg-Fe sulfate, Mg-sulfate, Kaolinite, & Fe-Mg-clay minerals. Lightly suggests the presence of Ca-Sulfate. +- TiO2: Lightly suggests presence of Plagioclase, and very lightly suggests the presence of several minerals. Again, no strong correlations though. +- P2O5: Mildly suggests the presence of Plagioclase, Iron oxide, Ca-sulfate, & FeTi oxides. Lightly suggests several other minerals. +- CaO: Mildly suggests the presence of Hydrated Iron oxide, Perchlorates, Na-perchlorate, Plagioclase, & Iron oxide. A bunch of light correlations to other minerals as well. +- Na2O: Mildly suggests the presence Iron oxide. Lightl suggests the presence of a bunch of other minerals as well. +- Al2O3: Mildly suggests the presence of Chlorite, Quartz, Disordered Silicates, & Feldspar. Very lightly suggests the presence of a bunch of other minerals. +- SiO2: Mildly suggests the presence of Chlorite, Quartz, Disordered Silicates, & Feldspar. Very lightly suggests the presence of several other minerals. +- K2O: Strongly suggests the presence of Chlorite, Quartz, Disordered Silicates, & Feldspar. Very lightly suggests the presence of a few other minerals. + +## 4.5 Conclusions, Limitations, and Future Work. + +There is likely a chemical reason for the strong correlations found above. Some correlations are obvious even to a non geologist, such as Chromite & Cr2O3 or several Sulfates & SO3, but other correlations aren't immediately obvious, such as the low correlation between Sulfate & SO3 or Perchlorates & Cl. There is likely a lot more connections to be seen here by someone with more geological insight. + +It would also be interesting to look at this heatmap in conjunction with the chemical formulas for the minerals listed in David's [**`aqueous.Rds`**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/dar-roberd10/StudentData/aqueous.Rds) file and see if that provides any new insights. + +Sadly, since, we don't have data for mineral presence at LIBS targets, we can't do a similar heatmap for that, but we may be able to look at the relation between concentration of elemental compounds and presence of minerals found through sample data to hypothesize what minerals are likely present at the LIBS targets based on the elemental compound concentrations found there. + +# 5.0 Finding 3: Lithology and SHERLOC match + +This section looks at the realtionship between the Lithology data and the SHERLOC data. + +We started with looking at two data sets, Lithology and SHERLOC, that both describe the presence of minerals at each sample site. + +Lithology is made up of binary numeric values representing present or not present, and SHERLOC is made up of discrete numeric values representing the level of presence. + +If these two match, then we only really need SHERLOC, as Lithology can be achieved by looking at if a SHERLOC value is "= 0" (then "0") or "> 0" (then "1"). + +For this reason, we want to confirm that SHERLOC and Lithology match. + +## 5.1 Data, Code, and Resources + +Here is a list of data sets, codes, and resources that are used in this work. The data sets below can be found on GitHub by clicking on them. + +c. Data sets (code for importation is hidden) + 1. [**v1_lithology.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_lithology.Rds) + 2. [**v1_sherloc.Rds**](https://github.rpi.edu/DataINCITE/DAR-Mars-F24/blob/main/StudentData/v1_sherloc.Rds) + +First we import the Lithology data set. Note that we don't need to import SHERLOC, since we imported that in section 4.1. +```{r, include=FALSE} +## Importing needed data sets +lithology.df <- readRDS("~/DAR-Mars-F24/StudentData/v1_lithology.Rds") +``` + +Then, we convert Lithology into a matrix, excluding the "Sample" column. Again, note that we don't need to do this for SHERLOC, as we also did this in section 4.1. +```{r, include=show} +## Creating lithology matrix +# removing the sample number column +# converting from feature to numeric +lithology.matrix <- as.matrix(as.data.frame(lapply(lapply( + lithology.df[,-1],as.character),as.numeric))) +``` + +## 5.2 Contribution + +This section is solely my own work. + +## 5.3 Methods Description + +There are many ways to confirm that Lithology and SHERLOC match, but the simplest one is probably to just subtract the SHERLOC matrix from the Lithology matrix and look at the minimum and maximum outputs. + +```{r, include=show} +## Calculating Difference matrix +diff <- lithology.matrix - sherloc.matrix +## Finding minimum, should be 0 +min(diff) +## Finding maximum, should be 0.75 +max(diff) +``` + +After doing this, we get that the minimum of the difference is "0" and the maximum of the difference is "0.75". + +## 5.4 Result and Discussion + +Since the minimum of the difference between them is "0", we know that if Lithology is "0" then SHERLOC is also "0", since if SHERLOC were > "0" then difference would be < "0". + +Since the maximum of the difference between them is "0.75", we know that if Lithology is "1" then SHERLOC is $\geq$ "0.25", since if SHERLOC were < "0.25" then difference would be > "0.75". + +This means that we know if Lithology claims a mineral is present then SHERLOC also claims the mineral is present, and that if Lithology claims that a mineral is absent then SHERLOC also claims that the mineral is absent. + +## 5.5 Conclusions, Limitations, and Future Work. + +Since we can show that SHERLOC and Lithology match, we have decided to only use the SHERLOC data set in Mars Mission Minder, since Lithology is simply a less detailed version of SHERLOC. + +If there are any carry over references to "Lithology" within the Mars Mission Minder, this is actually referring to the binary version of SHERLOC. + +# 6.0 Overall conclusions and suggestions. + +In the future, as more samples are added, I would suggest streamlining the process of integrating new data. That is, I would recommend creating one master file that pulls in all the data in their raw form (csv's or manually added data from sample reports), cleans and organizes it, and then outputs the finalized data sets. The current system has many intermediary steps, some of which aren't available for viewing and editing, making it hard to integrate new data. + +When the Mission Minder gets extended to the other Mars Missions, I would recommend (at the start!) sitting down and finding some uniform naming, organizing, and updating system that can work for **ALL** the missions. This way, when analysis *between* missions starts to be created, the data will already line up neatly without extra manipulation required. I would recommend this to be a priority, done before analysis starts being created, so that the data is ready and finalized before it starts being used. Many of the things that make our current cleaning process and our finalized data sets unideal were caused by people using and changing the pre-existing (uncleaned) data. + +# 7.0 Bibliography + +* Citations from literature. + * “Mars Rock Samples - NASA Science.” NASA, [science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/](https://science.nasa.gov/mission/mars-2020-perseverance/mars-rock-samples/). Accessed 12 Dec. 2024. + * "Analyst Notebook", [https://an.rsl.wustl.edu/m20/AN/account/login.aspx](https://an.rsl.wustl.edu/m20/AN/account/login.aspx). Note that an account is required to access the notebook. +* Significant R packages used + * reshape2, Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20. URL . + * pheatmap, Kolde R (2019). _pheatmap: Pretty Heatmaps_. R package version 1.0.12, . + +# 8.0 Appendix + +## 8.1 Converting Lithology/SHERLOC factor data to the correct numeric + +If you attempt to directly convert Lithology or SHERLOC from factor to numeric using `as.factor` you run into a problem. The "number" factor data gets changed into the wrong "number" numeric data. As seen in the table below. + +| Factor # | Lithology Basic Conversion | SHERLOC Basic Conversion | Desired Conversion | +|----------|------------------------------|----------------------------|--------------------| +| "0" | 1 | 1 | 0 | +| "0.25" | | 2 | 0.25 | +| "0.5" | | 3 | 0.5 | +| "0.75" | | 4 | 0.75 | +| "1" | 2 | 5 | 1 | + +The code chunk displayed below shows my method for getting around this. + +```{r, include=TRUE} +# Converting factor "numbers" to equivalent numeric data +lithology.df[,-1] <- as.data.frame(lapply(lapply( + lithology.df[,-1],as.character),as.numeric)) +sherloc.df[,-1] <- as.data.frame(lapply(lapply( + sherloc.df[,-1],as.character),as.numeric)) +``` + +Code explained: + +- We don't change the 1st column (`[,-1]`), since that is "Sample" ID and that should remain as `integer`. +- Each class change is done using `lapply(dataset,as.newclass)` since we are working with a data frame and are trying to individually change the class of each element of the data frame. The lapply function (part of base r) is a convenient way to do this. +- We first convert to `character` before converting to `numeric` because `factor` smoothly converts to the identical `character` (I.e., if you have the string "Blarg" as a factor, the character version will be the identical string "Blarg") but it doesn't smoothly convert to `numeric` (I.e., if it is the string "Blarg" as a factor, the numeric version would be `1`, assuming "Blarg" is the first factor listed. Similarily if the string "0" is the first factor, it will convert to the numeric "1"). Then, since `character` smoothly converts to `numeric` **if** the character *is a number*, we can now convert without complications. +- We say `as.data.frame`, since lapply's normal output is a list. This isn't *really* needed here, because r interprets the `[,-1]` to mean we want a dataframe not a list and will give us that on it's own, but for safety it's best to include this, as there may be another case where you are trying to convert the *entire* data frame, and you'll thus need to use `as.data.frame`. diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.html b/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.html new file mode 100644 index 0000000..d8334d9 --- /dev/null +++ b/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.html @@ -0,0 +1,2467 @@ + + + + + + + + + + + + + + +Mars Final Project Report + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + + +
+

1 DAR Project and Group +Members

+
    +
  • Project name: MARS
  • +
  • Project team members: +
      +
    • Ashton Compton +
        +
      • dar-compta
      • +
    • +
    • Aadi Lahiri +
        +
      • dar-lahira
      • +
    • +
    • CJ Marino +
        +
      • dar-marinc8
      • +
    • +
    • Nicolas Morawski +
        +
      • dar-morawn
      • +
    • +
    • Dante Mwatibo +
        +
      • dar-mwatid
      • +
    • +
    • Charlotte Peterson +
        +
      • dar-peterc
      • +
    • +
    • Doña Roberts +
    • +
    • Margo VanEsselstyn +
        +
      • dar-vanesm
      • +
    • +
    • David Walczyk +
        +
      • dar-walczd3
      • +
    • +
  • +
+
+
+

2 0.0 Preliminaries.

+

This report is generated from an R Markdown file that includes all +the R code necessary to produce the results described and embedded in +the report. Code blocks can be suppressed from output for readability. +If show <- FALSE the code block will be suppressed; if +show <- TRUE then the code will be shown.

+
# Set to TRUE to expand R code blocks; set to FALSE to collapse R code blocks 
+show <- TRUE
+

Executing this R notebook requires the following packages:

+
    +
  • ggplot2
  • +
  • tidyverse
  • +
  • pheatmap
  • +
  • reshape2
  • +
+

These will be installed and loaded as necessary.

+
+
+

3 1.0 Project +Introduction

+

This report is for the Mars 2024 Data Analytics Research group. Our +goal is to create an app (“Mission Minder”) that shows unique dynamic +analysis of the Mars Perseverance data.

+

This particular report mainly focuses on how the data analyzed in +“Mission Minder” is cleaned and organized. With a brief look at how the +two types of features in sample data (mineral presence and elemental +compound concentration) relate and why we won’t be using Lithology going +forward.

+
+
+

4 2.0 Organization of +Report

+

This report is organized as follows:

+
    +
  • Section 3.0. “Finding 1: Data Organization”

    +
      +
    • Details of how we reformatted and cleaned the data for consistency +and ease of use over multiple analysis methods.
    • +
    • Notes on decisions made in the cleaning process, their consequences, +and potential alternative choices.
    • +
    • Instructions on how to integrate new samples/targets into the data +sets.
    • +
  • +
  • Section 4.0: “Finding 2: Correlation Between Elemental Compounds +and Minerals”

    +
      +
    • Calculating the correlation between the two types of numeric +features (PIXL & SHERLOC) of the sample data.
    • +
  • +
  • Section 5.0: “Lithology and SHERLOC match”

    +
      +
    • Showing that Lithology is just the binary representation of +SHERLOC.
    • +
  • +
  • Section 6.0 “Overall conclusions and suggestions”

  • +
  • Section 7.0 “Bibliography”

  • +
  • Section 8.0 “Appendix”

    +
      +
    • Instructions on how to convert Lithology/SHERLOC data from factor to +the accurate numeric.
    • +
  • +
+
+
+

5 3.0 Finding 1: Data +Organization

+

The original data for Perseverance was inconsistently organized and +not always in a usable form. This section describes how the data was +reorganized, re-classed, and relabeled to be more intuitive and to +minimize the data manipulation needed for each individual analysis.

+
+

5.1 3.1 Data, Code, and +Resources

+

Here is a list of data sets, codes, and resources that relate to this +section. Note that if you click on the file names it will send you to +that file in the GitHub.

+
    +
  1. Code altering the data sets +
      +
    1. v1_consistent_data_naming.Rmd +is the notebook that cleans the data and outputs all the v1 data sets. +All the cleaning done to the code is described in the notebook and +should be easily editable.
    2. +
  2. +
  3. Markdowns describing data sets +
      +
    1. v1_Data_Introduction.md +is the markdown file describing each of the new v1 data sets, what they +contain, and how they have been changed.
    2. +
    3. “Dataset-Description” +is the GitHub wiki page describing each of the v1 data sets. As of +December 14th, 2024, this is more up to date than +v1_Data_Introuduction.md.
    4. +
  4. +
  5. Data sets +
      +
    1. v1_sample_meta.Rds +is the data set containing all the meta data for the physical samples +Perseverance collected.
    2. +
    3. v1_pixl.Rds +is the data set containing all the elemental compound concentration data +for the physical samples Perseverance collected.
    4. +
    5. v1_sherloc.Rds +is the data set indicating confidence in mineral presence (NOT bindary) +in the physical samples Perseverance collected.
    6. +
    7. v1_lithology.Rds +is the data set indicating whether minerals were present or not present +(binary) in the physical samples Perseverance collected.
    8. +
    9. v1_libs.Rds +is the data set containing all the meta data and the elemental compound +concentration data for the LIBS scans Perseverance did on Mars +rock.
    10. +
    11. v1_libs_to_sample.Rds +is the data set containing the nearest sample to each LIBS target/point +and the distance between them.
    12. +
    13. v1_libs_earth_references.Rds +is the data set containing all the meta data and the elemental compound +concentration data for the LIBS scans Perseverance did on the Earth +reference rocks it carries with it. That is, the “scct” targets.
    14. +
  6. +
  7. Final +Notebooks +
      +
    1. peterc_finalProjectF24.Rmd +is Charlotte’s final notebook. Referenced in part 3.2, 3.3, and +3.5.
    2. +
    3. vanesm_finalProjectF24.Rmd +is Margo’s final notebook. Referenced in part 3.2, 3.3, and 3.5.
    4. +
    5. walczd3_finalProjectF24.Rmd +is David’s final notebook. Referenced in part 3.2.
    6. +
  8. +
+
+
+

5.2 3.2 Contribution

+

The reordering and data cleaning was done by me, Doña Roberts.

+

The data sets v1_libs_to_sample.Rds, the Type meta data +for v1_libs.Rds, and the Sol, Lat, and Lon meta data for +v1_pixl.Rds came from date sets created by Margo +VanEsselstyn and Charlotte Peterson. Their final notebooks, which +document where they got the new data, can be found on the GitHub +following links 3.1.d.1 and 3.1.d.2.

+

Additionally, David created a data set, aqueous.Rds +that describes the details of each of the minerals in SHERLOC. His final +notebook, which documents how he created this data set, can also be +found on GitHub following link 3.1.d.3.

+
+
+

5.3 3.3 Data +Description

+

For the description of the resulting data sets, see the wiki page: https://github.rpi.edu/DataINCITE/DAR-Mars-F24/wiki/Dataset-Description

+

Since the wiki doesn’t go into detail on what changes were made (just +what the end result is) we’ll summarize that now. Note that the actual +file implementing these changes (with comments) is +v1_consistent_data_naming.Rmd and can be found by on GitHub +by following link 3.1.a.1.

+
    +
  1. Imported data sets. These are the data I started with and then +manipulated to create the v1s. +
      +
    • pixl_sol_coordinates.Rds, +which is the PIXL data set including Sol/Lat/Lon created by Charlotte +and Margo. See their final notebooks for how they created this +file.
    • +
    • supercam_libs_moc_loc.Rds, +which is the LIBS data. This comes from a csv downloaded from NASA.
    • +
    • libs_typed.Rds, +which is the LIBS data set including a column stating what type of scan +it was. This was also created by Charlotte and Margo. See their final +notebooks for how they created this file.
    • +
    • mineral_data_static.Rds, +which is the Lithology data. This comes from the “Lithology” paragraph +at the start of each sample’s sample +report.
    • +
    • abrasions_sherloc_samples.Rds, +which is the SHERLOC data set that comes from the “SHERLOC” tables in +the sample +reports. +
        +
      • Some manipulation was done to this data set to make it’s format +match the others’.
      • +
    • +
    • PIXL_LIBS_Combined.Rds, +which is the data set listing each LIBS targets’ nearest abrasion and +the distance to it. This was created by Charlotte and Margo. See their +final notebooks for how they created this file.
    • +
  2. +
  3. Renamed Columns +
      +
    • Capitalized all column names for consistency
    • +
    • Changed PIXL elemental compound naming convention to match +LIBS’s
    • +
    • Created a logical vector from the “earthsample?” column of +libs_typed.Rds (to be used later in separating out the +earth references)
    • +
    • Changed Lithology & SHERLOC’s mineral naming convention to be +consistent between minerals and with each other
    • +
    • Changed names of PIXL_LIBS_Combined.Rds (now called +“libs_to_sample”) so that it matches the naming convention of PIXL/LIBS +and to make explicit whether a column came from PIXL or came from +LIBS
    • +
  4. +
  5. Separated out Sample Meta data +
      +
    • See section 3.4’s table of decisions, row 2, for details.
    • +
  6. +
  7. Separated out LIBS earth reference data from rest of data +
      +
    • See section 3.4’s table of decisions, row 3, for details.
    • +
  8. +
  9. Created the LIBS clusters used in most LIBS analysis and creating a +column in the LIBS data frame indicating said clustering +
      +
    • See section 3.4’s table of decisions, row 6, for details.
    • +
  10. +
  11. Reordered Columns +
      +
    • Reordered LIBS columns such that the meta data is all at the +front
    • +
    • Reordered PIXL columns such that the elemental compounds also found +in LIBS data are ordered the same, with elemental compound data unique +to PIXL is at the end
    • +
    • Reordered Lithology and Sherloc so that they match
    • +
    • Reordered libs_to_sample columns such that LIBS meta +data was first, distance between, PIXL meta data next, and LIBS +elemental compound data last. +
        +
      • See section 3.4’s table of decisions, row 4, for details on the +inclusion of the LIBS elemental compound data and all of the meta +data.
      • +
    • +
  12. +
  13. Changed data types +
      +
    • New data types are listed in the Wiki linked at the start of this +section.
    • +
    • Several character types were changed to +factor +
        +
      • See section 3.4’s table of decisions, row 1, for details on one such +change.
      • +
    • +
    • Sample Meta was originally all character, numbers were changed to +numeric or integer and labels were changed to +factor
    • +
  14. +
  15. Saved as new Rds files +
      +
    • All new Rds file names are proceeded with v1_ with the +intention that updates to these data sets will be saved with the prefix +vn_ where “n” denotes what addition they are. This way, if +any errors occur in the updating process, the Mission Minder can still +run on the older data sets while the new ones are fixed.
    • +
  16. +
+
+
+

5.4 3.4 Limitations

+

While I attempted to organize the data in the most useful ways, there +were some changes that could have been un-ideal, inconvenient, or had a +valid alternative.

+

Here is a table of the most notable ones:

+

Table of Decisions

+ ++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
File:Part in question:Decision:Pros:Cons:Alternative:
v1_sherlocThe class of mineral presencesSetting the class as factorAccurate to what it isInconvenient to use, see 8.1 for how to convert to numeric1) Leaving class as character 2) Setting class as +numeric
___________________________________________________________________
v1_sample_ metaThe existence of a separate meta data setSeparating meta data outCleaner, Neater, Less repetitiveOne more data set to importRepeating meta data in each sample data set
___________________________________________________________________
v1_libs_earth _referencesThe existence of a separate earth reference data setSeparating scct (Earth reference) targets outWe aren’t accidentally looking at Earth rocks when trying to analyze +Mars onesOne more data set to import, harder to compare scct +data to other types of libs dataKeeping scct targets in the v1_libs data +set
___________________________________________________________________
v1_libs_to_ sampleThe inclusion of non-identifier PIXL and LIBS dataInclude the extra meta and elemental compound dataConvenience. Change would be too late in process, you’d have to go +through and change any code depending on the “fluff’s” inclusionMakes the data set more complex, more repetitive, and less +cleanRemoving non-identifier data from +v1_libs_to_sample
___________________________________________________________________
v1_consistent _data_namingLine 153, the inclusion of all samples in +v1_lithologyLimiting Lithology to only first 16 samples (by several +requests)Don’t need to remove partially received samples for every analysis +including PIXL dataProblematic for when we receive new samples!!!1) Don’t limit it but people need to manually remove extra rows for +analysis 2) Define a variable “n” at start of file, limit to “n” +samples. Increment “n” every time we get a new sample
___________________________________________________________________
v1_libsThe inclusion of a “Cluster” columnInclude cluster columnFor analysis using the 4 clusters, it’s easier. Unifies clustering +between people’s analysisDoesn’t really make sense with the app where number of clusters is +changeable and seed is already consistent between analysisDon’t include cluster column, instead rely on a “Cluster” variable +in the app which is dependent on the number of clusters requested
+

There are many more decisions made while cleaning the data, but the +above table details the most important (questionable) ones.

+
+
+

5.5 3.5 Future Work

+

In the future, we expect to receive more Sample and LIBS data from +NASA. When this happens, there needs to be a way to integrate it.

+

Below are the steps required, and the files to reference, in order to +add and clean the new data. For some of the steps the code to complete +it isn’t public in the Data or StudentData GitHub folders, so the person +responsible for that code or the place to find it detailed is +identified.

+

For new Sample data:

+
    +
  1. Get the new data from NASA to update +mineral_data_static.Rds, +abrasions_sherloc_samples.Rds, and +sample_pixl_wide.Rds (which is used in the creation of +pixl_sol_coordinates.Rds)

    +
      +
    • For updating mineral_data_static.Rds, look at the +“Lithology” report at the start of each new sample’s Sample +Report. As of December 14th 2024, the specific code to create the +Rds file isn’t public, so contact Dr. Erickson.
    • +
    • For updating abrasions_sherloc_samples.Rds, look at the +“SHERLOC” table in each new sample’s Sample +Report. Talk to Karen Rogers for details.
    • +
    • For updating sample_pixl_wide.Rds, as of December 14th +2024, there is no documentation available on GitHub explaining how this +Rds file was created. Check with Dr. Erickson to see where +this data came from and how to update it.
    • +
  2. +
  3. Follow instructions in Margo and Charlotte’s notebooks to update +pixl_sol_coordinates.Rds. The Sol/Lat/Lon information they +added comes from the Analyst +Notebook.

  4. +
  5. Follow instructions in Margo and Charlotte’s notebooks to update +PIXL_LIBS_Combined.Rds for the new LIBS points.

  6. +
  7. Change line 153 of v1_consistent_data_naming.Rmd +from lithology.df <- lithology.df[1:16,~~~] to +lithology.df <- lithology.df[1:n,~~~] where +n is the new number of samples.

  8. +
  9. Run v1_consistent_data_naming.Rmd to update +v1_sample_meta.Rds, v1_pixl.Rds, +v1_sherloc.Rds, and v1_libs_to_sample.Rds.

    +
      +
    • Note that since Lithology has become redundant, there is no need to +update v1_lithology.Rds.
    • +
  10. +
+

For new LIBS data:

+
    +
  1. Download new moc csv file from LIBS, and update +supercam_libs_moc_loc.Rds. As of December 14th 2024, the +specific code to create the Rds file isn’t public, so contact +Dr. Erickson.

  2. +
  3. Follow instructions in Margo and Charlotte’s notebooks to create +libs_typed.Rds for the new LIBS points.

  4. +
  5. Follow instructions in Margo and Charlotte’s notebooks to update +PIXL_LIBS_Combined.Rds for the new LIBS points.

  6. +
  7. Run v1_consistent_data_naming.Rmd to update +v1_libs.Rds, v1_libs_earth_references.Rds, and +v1_libs_to_sample.Rds.

  8. +
+

Some recommendations for updating the data sets:

+
    +
  • Instead of updating “v1” to contain the new data, create a copy +of v1_consistent_data_naming.Rmd and replace all references +to “v1” with “v2” (including the title of the file). This way, instead +of updating the “v1” data sets you are creating new “v2” data sets. This +means that if something goes wrong and it breaks, you can always fall +back to the previous version while you figure out how to fix the +break.

  • +
  • When creating the v2_consistent_data_naming.Rmd, go +through and find Margo and Charlotte’s code to update their data sets +and add it to the start of the new v2 code so that in the +future there is only one file to run when updating. I was going to do +this but, as they were still improving and updating it, the code wasn’t +stable enough for the transfer to make sense.

  • +
+
+
+
+

6 4.0 Finding 2: +Correlation Between Elemental Compounds and Minerals

+

This section looks at how the concentration of elemental compounds +(PIXL data) relates to/indicates the presence of minerals (SHERLOC data) +at a sample location.

+

We are looking at the question of how PIXL and SHERLOC relate to each +other, and thus the [Exploration] –> [PIXL vs Sherloc] section of the +Mission Minder.

+
+

6.1 4.1 Data, Code, and +Resources

+

Here is a list of data sets, codes, and resources that are used in +this work. Click on the file name to be sent to the file on GitHub.

+
    +
  1. Data sets (code for importation is hidden) +
      +
    1. v1_pixl.Rds
    2. +
    3. v1_sherloc.Rds +with the exception of “Hydrated Carbonates” which is the same for all +samples.
    4. +
  2. +
+

First, we need to report the data sets.

+

Then, before we can actually calculate the data we need, we must +combine the two data sets into a matrix with only numeric data.

+
## Creating pixl matrix
+# removing the sample number column
+pixl.matrix <- as.matrix(pixl.df[,-1])
+## Creating sherloc matrix
+# removing the sample number column
+# converting from feature to numeric
+sherloc.matrix <- as.matrix(as.data.frame(lapply(lapply(
+                  sherloc.df[,-1],as.character),as.numeric))) 
+## Combining pixl and sherloc into one matrix
+# removing the column(s) with no deviation, in this case "Hydrated Carbonates"
+combined.matrix <- cbind(pixl.matrix,sherloc.matrix[,-16])
+

With the data from the combined matrix, we calculate the pearson +correlation of their combined data, and then look at only the +correlation between features in PIXL and features in SHERLOC

+
# Calculating pearson feature correlation
+combined.cormat <- round(cor(combined.matrix),2)
+# Selecting only the correlations between pixl features and sherloc features
+combined.cormat <- combined.cormat[colnames(pixl.matrix),colnames(sherloc.matrix[,-16])]
+
+
+

6.2 4.2 Contribution

+

This section is solely my own work.

+
+
+

6.3 4.3 Methods +Description

+

In order to get from raw SHERLOC and PIXL data to their correlation, +I simply used the cor function from base r to calculate the +correlation between each feature in PIXL (elemental compounds) and +SHERLOC (minerals) and then selected only the correlations between +features of PIXL and features of SHERLOC (instead of correlations within +PIXL or within SHERLOC) to find my desired correlations.

+

In the next section, I simply use pheatmap to +hierarchaly cluster and output a heatmap of correlations.

+
+
+

6.4 4.4 Result and +Discussion

+
# Creating title for heatmap
+heatmap.title <- "Correlation between Elemental Concentration & Mineral Presence"
+# Printing heatmap of correlation
+pheatmap(combined.cormat,
+         scale="none",
+         treeheight_row = 10,treeheight_col = 10,
+         main = heatmap.title)
+

+Caption: Vertical axis is PIXL elemental compounds, horizontal axis is +SHERLOC minerals. Colored squares indicate the correlation between the +elemental compound in that row and the mineral in that column. That is +to say, if the square is red then when that elemental compound is +present it is highly likely that the mineral is present and if that +square is dark blue then when that elemental compound is present it is +highly unlikely that the mineral is present.

+

Here we can clearly see see which elemental compounds correlate to +which minerals.

+

For example, high SO3 concentration strongly correlates to the +presence of Hydrated Mg-Fe sulfate, Mg-sulfate, Kaolinite, & +Fe-Mg-clay minerals.

+
    +
  • Here the sulfates make sense, as sulfate is SO4, or, O + +SO3
  • +
  • Interestingly though, SO3 doesn’t have a strong +correlation with Sulfate itself.
  • +
+

Similarly, high Cr2O3 concentration strongly correlates to the +presence of Apatite, Spinels, Zircon/Baddeleyite, Chromite, & +Ilmenite.

+
    +
  • Here Chromite makes sense, since it’s FeCr2O4, or, FeO + +Cr2O3
  • +
+

Overall we see that;

+
    +
  • MgO: Lightly suggests presence of Fe-Mg carbonate, very +lightly correlated to a bunch of minerals but not strongly correlated to +the presence of any of the minerals
  • +
  • Cr2O3: Strongly suggests the presence of Apatite, +Spinels, Zircon/Baddeleyite, Chromite, & Ilmenite.
  • +
  • Cl: Strongly suggests the presence of Fe-Mg +carbonate, & Halite. Very lightly correlated to a few other +minerals.
  • +
  • MnO: Mildly suggests the presence of Carbonate and lightly corelated +to presence of Pyroxene. Very lightly correlated to a bunch of other +minerals, but nothing strong.
  • +
  • FeO-T: Mildly suggests the presence of Carbonate, and very lightly +correlated to a bunch of minerals.
  • +
  • SO3: Strongly suggests the presence of Hydrated +Mg-Fe sulfate, Mg-sulfate, Kaolinite, & Fe-Mg-clay minerals. Lightly +suggests the presence of Ca-Sulfate.
  • +
  • TiO2: Lightly suggests presence of Plagioclase, and very lightly +suggests the presence of several minerals. Again, no strong correlations +though.
  • +
  • P2O5: Mildly suggests the presence of Plagioclase, Iron oxide, +Ca-sulfate, & FeTi oxides. Lightly suggests several other +minerals.
  • +
  • CaO: Mildly suggests the presence of Hydrated Iron oxide, +Perchlorates, Na-perchlorate, Plagioclase, & Iron oxide. A bunch of +light correlations to other minerals as well.
  • +
  • Na2O: Mildly suggests the presence Iron oxide. Lightl suggests the +presence of a bunch of other minerals as well.
  • +
  • Al2O3: Mildly suggests the presence of Chlorite, Quartz, Disordered +Silicates, & Feldspar. Very lightly suggests the presence of a bunch +of other minerals.
  • +
  • SiO2: Mildly suggests the presence of Chlorite, Quartz, Disordered +Silicates, & Feldspar. Very lightly suggests the presence of several +other minerals.
  • +
  • K2O: Strongly suggests the presence of Chlorite, Quartz, Disordered +Silicates, & Feldspar. Very lightly suggests the presence of a few +other minerals.
  • +
+
+
+

6.5 4.5 Conclusions, +Limitations, and Future Work.

+

There is likely a chemical reason for the strong correlations found +above. Some correlations are obvious even to a non geologist, such as +Chromite & Cr2O3 or several Sulfates & SO3, but other +correlations aren’t immediately obvious, such as the low correlation +between Sulfate & SO3 or Perchlorates & Cl. There is likely a +lot more connections to be seen here by someone with more geological +insight.

+

It would also be interesting to look at this heatmap in conjunction +with the chemical formulas for the minerals listed in David’s aqueous.Rds +file and see if that provides any new insights.

+

Sadly, since, we don’t have data for mineral presence at LIBS +targets, we can’t do a similar heatmap for that, but we may be able to +look at the relation between concentration of elemental compounds and +presence of minerals found through sample data to hypothesize what +minerals are likely present at the LIBS targets based on the elemental +compound concentrations found there.

+
+
+
+

7 5.0 Finding 3: +Lithology and SHERLOC match

+

This section looks at the realtionship between the Lithology data and +the SHERLOC data.

+

We started with looking at two data sets, Lithology and SHERLOC, that +both describe the presence of minerals at each sample site.

+

Lithology is made up of binary numeric values representing present or +not present, and SHERLOC is made up of discrete numeric values +representing the level of presence.

+

If these two match, then we only really need SHERLOC, as Lithology +can be achieved by looking at if a SHERLOC value is “= 0” (then “0”) or +“> 0” (then “1”).

+

For this reason, we want to confirm that SHERLOC and Lithology +match.

+
+

7.1 5.1 Data, Code, and +Resources

+

Here is a list of data sets, codes, and resources that are used in +this work. The data sets below can be found on GitHub by clicking on +them.

+
    +
  1. Data sets (code for importation is hidden) +
      +
    1. v1_lithology.Rds
    2. +
    3. v1_sherloc.Rds
    4. +
  2. +
+

First we import the Lithology data set. Note that we don’t need to +import SHERLOC, since we imported that in section 4.1.

+

Then, we convert Lithology into a matrix, excluding the “Sample” +column. Again, note that we don’t need to do this for SHERLOC, as we +also did this in section 4.1.

+
## Creating lithology matrix
+# removing the sample number column
+# converting from feature to numeric
+lithology.matrix <- as.matrix(as.data.frame(lapply(lapply(
+                    lithology.df[,-1],as.character),as.numeric))) 
+
+
+

7.2 5.2 Contribution

+

This section is solely my own work.

+
+
+

7.3 5.3 Methods +Description

+

There are many ways to confirm that Lithology and SHERLOC match, but +the simplest one is probably to just subtract the SHERLOC matrix from +the Lithology matrix and look at the minimum and maximum outputs.

+
## Calculating Difference matrix
+diff <- lithology.matrix - sherloc.matrix
+## Finding minimum, should be 0
+min(diff)
+
## [1] 0
+
## Finding maximum, should be 0.75
+max(diff)
+
## [1] 0.75
+

After doing this, we get that the minimum of the difference is “0” +and the maximum of the difference is “0.75”.

+
+
+

7.4 5.4 Result and +Discussion

+

Since the minimum of the difference between them is “0”, we know that +if Lithology is “0” then SHERLOC is also “0”, since if SHERLOC were > +“0” then difference would be < “0”.

+

Since the maximum of the difference between them is “0.75”, we know +that if Lithology is “1” then SHERLOC is \(\geq\) “0.25”, since if SHERLOC were < +“0.25” then difference would be > “0.75”.

+

This means that we know if Lithology claims a mineral is present then +SHERLOC also claims the mineral is present, and that if Lithology claims +that a mineral is absent then SHERLOC also claims that the mineral is +absent.

+
+
+

7.5 5.5 Conclusions, +Limitations, and Future Work.

+

Since we can show that SHERLOC and Lithology match, we have decided +to only use the SHERLOC data set in Mars Mission Minder, since Lithology +is simply a less detailed version of SHERLOC.

+

If there are any carry over references to “Lithology” within the Mars +Mission Minder, this is actually referring to the binary version of +SHERLOC.

+
+
+
+

8 6.0 Overall conclusions +and suggestions.

+

In the future, as more samples are added, I would suggest +streamlining the process of integrating new data. That is, I would +recommend creating one master file that pulls in all the data in their +raw form (csv’s or manually added data from sample reports), cleans and +organizes it, and then outputs the finalized data sets. The current +system has many intermediary steps, some of which aren’t available for +viewing and editing, making it hard to integrate new data.

+

When the Mission Minder gets extended to the other Mars Missions, I +would recommend (at the start!) sitting down and finding some uniform +naming, organizing, and updating system that can work for +ALL the missions. This way, when analysis +between missions starts to be created, the data will already +line up neatly without extra manipulation required. I would recommend +this to be a priority, done before analysis starts being created, so +that the data is ready and finalized before it starts being used. Many +of the things that make our current cleaning process and our finalized +data sets unideal were caused by people using and changing the +pre-existing (uncleaned) data.

+
+
+

9 7.0 Bibliography

+ +
+
+

10 8.0 Appendix

+
+

10.1 8.1 Converting +Lithology/SHERLOC factor data to the correct numeric

+

If you attempt to directly convert Lithology or SHERLOC from factor +to numeric using as.factor you run into a problem. The +“number” factor data gets changed into the wrong “number” numeric data. +As seen in the table below.

+ ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Factor #Lithology Basic ConversionSHERLOC Basic ConversionDesired Conversion
“0”110
“0.25”20.25
“0.5”30.5
“0.75”40.75
“1”251
+

The code chunk displayed below shows my method for getting around +this.

+
# Converting factor "numbers" to equivalent numeric data
+lithology.df[,-1] <- as.data.frame(lapply(lapply(
+                    lithology.df[,-1],as.character),as.numeric))
+sherloc.df[,-1] <- as.data.frame(lapply(lapply(
+                    sherloc.df[,-1],as.character),as.numeric))
+

Code explained:

+
    +
  • We don’t change the 1st column ([,-1]), since that is +“Sample” ID and that should remain as integer.
  • +
  • Each class change is done using +lapply(dataset,as.newclass) since we are working with a +data frame and are trying to individually change the class of each +element of the data frame. The lapply function (part of base r) is a +convenient way to do this.
  • +
  • We first convert to character before converting to +numeric because factor smoothly converts to +the identical character (I.e., if you have the string +“Blarg” as a factor, the character version will be the identical string +“Blarg”) but it doesn’t smoothly convert to numeric (I.e., +if it is the string “Blarg” as a factor, the numeric version would be +1, assuming “Blarg” is the first factor listed. Similarily +if the string “0” is the first factor, it will convert to the numeric +“1”). Then, since character smoothly converts to +numeric if the character is a +number, we can now convert without complications.
  • +
  • We say as.data.frame, since lapply’s normal output is a +list. This isn’t really needed here, because r interprets the +[,-1] to mean we want a dataframe not a list and will give +us that on it’s own, but for safety it’s best to include this, as there +may be another case where you are trying to convert the entire +data frame, and you’ll thus need to use as.data.frame.
  • +
+
+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + + diff --git a/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.pdf b/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.pdf new file mode 100644 index 0000000..f0c9dbf Binary files /dev/null and b/StudentNotebooks/Assignment08_FinalProjectNotebook/roberd10_final.pdf differ