From f240a91261c9660dca0c7c3b396f03f5bb5e9c8c Mon Sep 17 00:00:00 2001 From: roberd10 Date: Fri, 30 Aug 2024 13:51:32 -0400 Subject: [PATCH] Edited some of the answers to the questions and added graphics --- .../roberd10-dar-assignment1-f24.Rmd | 45 ++++++++++++++++--- 1 file changed, 40 insertions(+), 5 deletions(-) diff --git a/StudentNotebooks/Assignment01/roberd10-dar-assignment1-f24.Rmd b/StudentNotebooks/Assignment01/roberd10-dar-assignment1-f24.Rmd index 1fedc12..9c10be8 100644 --- a/StudentNotebooks/Assignment01/roberd10-dar-assignment1-f24.Rmd +++ b/StudentNotebooks/Assignment01/roberd10-dar-assignment1-f24.Rmd @@ -271,7 +271,7 @@ wssplot(pixl_trim.mat, nc=8, seed=2) ``` -Based on where the "elbow" occurs, it looks like `3` might be a good `k` choice for k-means clustering. +Based on where the "elbow" occurs, it looks like `5` might be a good `k` choice for k-means clustering. But for the sake of simplifying this assignment and because there are few data points, we are going to examine the solution with k=3 clusters instead. ## k-means Clustering @@ -336,21 +336,56 @@ ggbiplot::ggbiplot(pixl_trim.mat.pca, ## ANSWER THESE QUESTIONS! -Add a description of each cluster here in your own words. +Add a description of each cluster here in your own words. + +**Important Note about Heat maps below:** The color and scale aren't the same across heatmaps, the first has 50 as a specefic shade of red, the second 30 and the fourth 40 for that same shade. Describe Cluster 1: Cluster 1 is igneous rock that has a lot higher concentration of Si02 and mildly higher concentration Al203 than the rest of the igneous rock and a lower concentration of FeO-T and Mgo. All 3 samples in Cluster 1 have identical collected data, so they overlap on the plot above. -Describe Cluster 2: Cluster 2 is all the Sedimentary rock. This cluster has a *much* lower concentration of Si02 and a mildly higher concentration of S03 Mgo. Interestingly enough, even though this cluster is made up of a different kind of rock than Cluster 1 and Cluster 3 are, the detected amount of FeO-T is in between those two clusters. +```{r} +#Heatmap of samples in cluster 1, scaled +pheatmap(subset(pixl_trim.mat,km$cluster == 1),scale="none",cluster_cols=FALSE,cluster_rows=FALSE) +``` + +Describe Cluster 2: Cluster 2 is all the Sedimentary rock. This cluster has a *much* lower concentration of Si02 and a mildly higher concentration of S03 and Mgo. Interestingly enough, even though this cluster is made up of a different kind of rock than Cluster 1 and Cluster 3 are, the detected amount of FeO-T is in between those two clusters. + +```{r} +#Heatmap of samples in cluster 1, scaled +pheatmap(subset(pixl_trim.mat,km$cluster == 2),scale="none",cluster_cols=FALSE,cluster_rows=FALSE) +``` Describe Cluster 3: Cluster 3 is again igneous rock, but this time with *lower* concentration of Si02 and Al203 and *higher* concentration of FeO-T and Mgo. Additionally, this cluster is a *lot* more varied than the igneous rock in cluster 1 is. For example, while the range of sampled FeO-T is 0 in cluster 1, it is 16.81 in cluster 3. -What do the clustering and PCA results tell us about the data detected by the M20 PIXL experiment? _Feel free to add graphs or analyses to support your conclusions._ +```{r} +#Heatmap of samples in cluster 1, scaled +pheatmap(subset(pixl_trim.mat,km$cluster == 3),scale="none",cluster_cols=FALSE,cluster_rows=FALSE) +``` + +What do the clustering and PCA results tell us about the data detected by the M2O PIXL experiment? + +The clustering results tell us which locations were similar in which molecules were present and in what quantities. This was discussed in more depth in the descriptions of the clusters above. + +The PCA results tell us in what ways the ratios of molecules tended to vary, both relative to each other and overall. + +By relative to each other I mean that molecules with similar coefficients in PC1 or PC2 tend to vary in similar ways. For example, Mgo and FeO-T both have large positive coefficients in PC1, and thus we can assume that when one increases the other tends to also increase, however we can tell Si02 tends to vary inversely to Mgo since it's coefficient is very negative in PC1. That is to say, PC1 suggests that samples with higher Mgo tend to have higher FeO-T and lower Si02. Looking at the heatmap below (which has been scaled by *column* to see how the presence of how a particular molecule's presence varies between sample sites) we see this does indeed appear to generally be the case. It's important to note that they don't *always* very as the coefficients would suggest, this is the case because we are *only* looking at the coefficients in PC1 for this example, the other PC's would help to describe the rest of the variation. ```{r} -# Student's code for graphs and analysis here! +#Looking at how the presence of molecules vary in relation to eachother +pheatmap(pixl_trim.mat[,colnames(pixl_trim.mat) %in% c("Mgo","FeO-T","Si02")],scale="column",cluster_cols=FALSE,cluster_rows=FALSE) ``` +By overall I mean that molecules with larger (farther from 0) coefficients in the early PC's are more important than molecules with lower coefficients (closer to 0) in telling the samples apart. If you look at the range in how much of the molecule is present between sample sites, you'll notice that the molecules with a larger range have a larger coefficient in PC1. Below we'll calculate the range of the molecule with the largest and the molecule with the smallest coefficient in PC1. + +```{r} +# Calculating range of moleculues with largest and smallest coefficients in PC1 +print(max(pixl_trim.mat[,"Si02"]) - min(pixl_trim.mat[,"Si02"])) +print(max(pixl_trim.mat[,"Mno"]) - min(pixl_trim.mat[,"Mno"])) + +``` + +We can see Si02, which has a coefficient of -0.747, the farthest from 0, in PC1, has a range of 34.5 while Mno, which has a coefficient of 0.001, the closest to 0, in PC1, has a range of 0.59. + ## SAVE, COMMIT and PUSH YOUR CHANGES! When you are satisfied with your edits and your notebook knits successfully, remember to push your changes to the repo using **steps 4-8** in **Section 2.2**, summarized here: