Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
DAR-Mars-F24/StudentNotebooks/Assignment05/wangx53-assignment5.Rmd
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
976 lines (724 sloc)
45.7 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "DAR F24 Assignment 5 Notebook" | |
author: "Xuanting Wang RIN :662016667" | |
date: "`r Sys.Date()`" | |
output: | |
pdf_document: | |
toc: true | |
latex_engine: xelatex | |
word_document: | |
toc: true | |
html_document: | |
toc: true | |
subtitle: 'DAR Project Name: Mars' | |
--- | |
## PIXL Data Analysis | |
1. **Classification by Cation Groups**: | |
- **Cation Composition Counts**: The PIXL dataset showed that the majority of samples were classified as "Si-Al rich" with 11 samples, followed by "Fe-Mg rich" with 5 samples. This classification indicates a composition with predominant Si and Al in most samples​:contentReference[oaicite:0]{index=0}. | |
2. **ANOVA Results for Cation Groups by Campaign**: | |
- **Si_Al**: The ANOVA test for `Si_Al` showed a significant difference across campaigns (\(p = 0.0014\)), indicating that the `Si_Al` composition varies meaningfully between campaigns. | |
- **Fe_Mg**: The `Fe_Mg` composition did not show significant variation across campaigns (\(p = 0.0791\)), suggesting similar levels of Fe and Mg in the different campaigns. | |
- **Ca_Na_K**: For `Ca_Na_K`, a significant difference was found across campaigns (\(p = 0.0136\)), indicating some compositional variance based on campaign location. | |
3. **Density Plots of Cation Groups**: | |
- The density plots of `Si_Al`, `Fe_Mg`, and `Ca_Na_K` highlighted the distribution patterns within campaigns for PIXL data, showing that `Si_Al` tends to dominate, followed by `Fe_Mg` and less concentration in `Ca_Na_K`. | |
4. **Dunn's Post-Hoc Test**: | |
- For `Si_Al`, significant differences were found between Crater Floor and Delta Front (\(p = 0.0004\)), supporting the variance in `Si_Al` composition across campaigns. | |
- **Fe_Mg**: A marginal significance (\(p = 0.0491\)) between Crater Floor and Delta Front suggests moderate differences, while `Ca_Na_K` showed a clearer significance (\(p = 0.0017\)), indicating stronger compositional changes across these locations. | |
5. **Single-Sample t-Test**: | |
- **Si_Al**: The mean concentration of `Si_Al` was significantly greater than the hypothetical value of 10 (\(p < 0.001\)), with a mean around 43.63. | |
- **Fe_Mg**: Similarly, `Fe_Mg` was significantly higher than the benchmark of 10 (\(p < 0.001\)), averaging 33.13. | |
- **Ca_Na_K**: The mean `Ca_Na_K` concentration was significantly lower than 10 (\(p = 0.0049\)), at an average of 6.94. | |
6. **Logistic Regression**: | |
- Logistic regression with `Si_Al`, `Fe_Mg`, and `Ca_Na_K` as predictors did not yield statistically significant results, indicating limited predictive power in differentiating campaigns based on these cation groups in PIXL data. | |
--- | |
## LIBS Data Analysis | |
1. **Classification by Cation Groups**: | |
- **Cation Composition Counts**: For LIBS, `Si-Al rich` samples were predominant (1257 samples), followed by `Fe-Mg rich` (645 samples) and a smaller number of `Ca-Na-K rich` samples (30), demonstrating a general trend toward higher Si and Al compositions​:contentReference[oaicite:1]{index=1}. | |
2. **ANOVA Results for Cation Groups by Campaign**: | |
- **Si_Al**: The ANOVA for `Si_Al` by campaign was highly significant (\(p < 0.0001\)), indicating substantial variation between campaigns, especially between Campaign 3 and the others. | |
- **Fe_Mg**: This group also showed significant differences (\(p < 0.0001\)), suggesting that Fe and Mg levels vary notably by campaign. | |
- **Ca_Na_K**: Similar to the other cation groups, `Ca_Na_K` showed significant differences across campaigns (\(p < 0.0001\)), with Campaign 3 distinctively lower than Campaign 1 and 2. | |
3. **Density Plots of Cation Groups**: | |
- Density plots of `Si_Al`, `Fe_Mg`, and `Ca_Na_K` for LIBS data revealed high densities for `Si_Al` and moderate densities for `Fe_Mg`, with lower densities in `Ca_Na_K` across campaigns, aligning with the observed classification trends. | |
4. **Dunn's Post-Hoc Test**: | |
- **Si_Al**: Dunn’s test indicated significant differences between Campaign 3 and both Campaigns 1 and 2 (\(p < 0.001\)), corroborating the ANOVA results. | |
- **Fe_Mg**: The test also highlighted significant differences between all campaigns for `Fe_Mg` (\(p < 0.001\)), supporting variability across locations. | |
- **Ca_Na_K**: Similar patterns were observed with significant differences across all campaign comparisons (\(p < 0.001\)), indicating that `Ca_Na_K` concentrations are not consistent across campaigns in the LIBS data. | |
5. **Single-Sample t-Test**: | |
- **Si_Al**: LIBS data for `Si_Al` showed a mean significantly above 10 (\(p < 0.001\)), with an average of 49.71. | |
- **Fe_Mg**: The mean concentration was also significantly greater than 10 (\(p < 0.001\)), at 36.55. | |
- **Ca_Na_K**: The mean `Ca_Na_K` concentration was significantly lower than 10 (\(p < 0.001\)), averaging around 7.08, similar to PIXL data. | |
6. **Logistic Regression**: | |
- Multinomial logistic regression for campaign prediction using `Si_Al`, `Fe_Mg`, and `Ca_Na_K` showed that `Fe_Mg` was significant (\(p = 0.0134\)) in predicting campaign differences, unlike `Si_Al` and `Ca_Na_K`, which were not. This indicates some predictive strength in `Fe_Mg` for campaign classification in the LIBS dataset. | |
--- | |
## Conclusion | |
Both PIXL and LIBS data reveal distinct composition patterns across campaigns, with high `Si_Al` concentrations dominating in both datasets. ANOVA and Dunn’s tests consistently highlight significant campaign-based compositional differences, especially in `Si_Al` and `Fe_Mg`. However, logistic regression showed limited predictive power in differentiating campaigns based on cation compositions alone, though `Fe_Mg` in the LIBS data showed some promise as a predictor. The single-sample t-tests confirm that both datasets generally exhibit `Si_Al` and `Fe_Mg` concentrations above typical benchmarks, while `Ca_Na_K` is below. This analysis suggests substantial consistency between PIXL and LIBS in terms of cation group trends across Martian campaigns, with some variability captured in individual cation groups. | |
**Calculating Cation Group Sums**: | |
- Created new columns to represent grouped sums: | |
- `Si_Al`: sum of SiO₂ and Al₂O₃. | |
- `Fe_Mg`: sum of FeO-T and MgO. | |
- `Ca_Na_K`: sum of CaO, Na₂O, and K₂O. | |
**Initial Classification**: Based on the sums of these groups, you assigned a class label: | |
- If `Si_Al` was the largest, it’s classified as "Si-Al rich". | |
- If `Fe_Mg` was the largest, it’s classified as "Fe-Mg rich". | |
- Otherwise, it’s classified as "Ca-Na-K rich". | |
**Ranking with Quantiles**: assigned quantile ranks (dividing values into 3 levels) for `Si_Al`, `Fe_Mg`, and `Ca_Na_K` values, using these to classify samples into the same three categories based on the highest rank. | |
## PIXL Data Analysis | |
```{r} | |
libs_data<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") | |
pixl_data<- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/samples_pixl_wide.Rds") | |
# Include 'campaign' column in the subset | |
cation_data <- pixl_data[, c("Si02", "Al203", "FeO-T", "Mgo", "Cao", "Na20", "K20", "campaign")] | |
# Calculate the cation group sums using backticks for column names with special characters | |
cation_data$Si_Al <- cation_data$Si02 + cation_data$Al203 | |
cation_data$Fe_Mg <- cation_data$`FeO-T` + cation_data$Mgo | |
cation_data$Ca_Na_K <- cation_data$Cao + cation_data$Na20 + cation_data$K20 | |
# Set thresholds for classification | |
cation_data$class <- ifelse(cation_data$Si_Al > | |
cation_data$Fe_Mg & cation_data$Si_Al > cation_data$Ca_Na_K, | |
"Si-Al rich", | |
ifelse(cation_data$Fe_Mg > cation_data$Ca_Na_K, "Fe-Mg rich", "Ca-Na-K rich")) | |
# Check the classifications | |
table(cation_data$class) | |
library(dplyr) | |
# Create quantiles for each group (Si_Al, Fe_Mg, Ca_Na_K) | |
cation_data$Si_Al_rank <- ntile(cation_data$Si_Al, 3) # Divides Si_Al into 3 quantiles | |
cation_data$Fe_Mg_rank <- ntile(cation_data$Fe_Mg, 3) # Divides Fe_Mg into 3 quantiles | |
cation_data$Ca_Na_K_rank <- ntile(cation_data$Ca_Na_K, 3) # Divides Ca_Na_K into 3 quantiles | |
# Now classify based on which group has the highest rank | |
cation_data$class <- ifelse(cation_data$Si_Al_rank >= | |
cation_data$Fe_Mg_rank & | |
cation_data$Si_Al_rank >= cation_data$Ca_Na_K_rank, | |
"Si-Al rich", | |
ifelse(cation_data$Fe_Mg_rank >= cation_data$Ca_Na_K_rank, | |
"Fe-Mg rich", | |
"Ca-Na-K rich")) | |
# Plotting the scatter plots again | |
library(ggplot2) | |
# install.packages("ggtern") | |
library(ggtern) | |
library(dplyr) | |
# Prepare the data for ternary plot | |
# Make sure the three components are in proportions or standardized | |
cation_data$total <- cation_data$Si_Al + cation_data$Fe_Mg + cation_data$Ca_Na_K | |
cation_data$Si_Al_prop <- cation_data$Si_Al / cation_data$total | |
cation_data$Fe_Mg_prop <- cation_data$Fe_Mg / cation_data$total | |
cation_data$Ca_Na_K_prop <- cation_data$Ca_Na_K / cation_data$total | |
``` | |
```{r} | |
# Create the ternary plot | |
ggtern(data = cation_data, aes(x = Si_Al_prop, y = Fe_Mg_prop, z = Ca_Na_K_prop, color = class)) + | |
geom_point(size = 3, alpha = 0.7) + | |
scale_color_manual(values = | |
c("Si-Al rich" = "blue", "Fe-Mg rich" = "green", "Ca-Na-K rich" = "orange")) + | |
labs(title = "Ternary Plot of Si+Al, Fe+Mg, and Ca+Na+K for PIXL Data", | |
x = "Si + Al", | |
y = "Fe + Mg", | |
z = "Ca + Na + K", | |
color = "Rock Type") + | |
theme_minimal() | |
``` | |
```{r} | |
# Load necessary libraries | |
library(dplyr) | |
# Calculate the count of each campaign and class combination | |
campaign_composition_summary <- cation_data %>% | |
group_by(campaign, class) %>% | |
summarize(count = n()) %>% | |
ungroup() | |
# Calculate the proportion within each campaign | |
campaign_composition_summary <- campaign_composition_summary %>% | |
group_by(campaign) %>% | |
mutate(proportion = count / sum(count)) %>% | |
ungroup() | |
# Display the summary table | |
print(campaign_composition_summary) | |
``` | |
```{r} | |
# Convert 'campaign' to a factor for ANOVA | |
cation_data$campaign <- as.factor(cation_data$campaign) | |
# Perform ANOVA for each cation group across campaigns | |
anova_Si_Al <- aov(Si_Al ~ campaign, data = cation_data) | |
anova_Fe_Mg <- aov(Fe_Mg ~ campaign, data = cation_data) | |
anova_Ca_Na_K <- aov(Ca_Na_K ~ campaign, data = cation_data) | |
# Summary of ANOVA results | |
summary(anova_Si_Al) | |
summary(anova_Fe_Mg) | |
summary(anova_Ca_Na_K) | |
# If significant, run a post-hoc Tukey test to determine where the differences lie | |
TukeyHSD(anova_Si_Al) | |
TukeyHSD(anova_Fe_Mg) | |
TukeyHSD(anova_Ca_Na_K) | |
``` | |
### ANOVA Results | |
1. **Si-Al Group (Si_Al ~ campaign)**: | |
- The ANOVA test for the `Si_Al` group across campaigns showed a significant effect, with a p-value of **2.47e-15** (p < 0.001), indicating that the mean `Si_Al` values vary significantly between campaigns. | |
2. **Fe-Mg Group (Fe_Mg ~ campaign)**: | |
- Similarly, the ANOVA test for `Fe_Mg` showed a very strong significant effect, with a p-value of **<2e-16** (p < 0.001). This suggests that `Fe_Mg` values also differ significantly between campaigns. | |
3. **Ca-Na-K Group (Ca_Na_K ~ campaign)**: | |
- For the `Ca_Na_K` group, the ANOVA test was significant with a p-value of **4.03e-11** (p < 0.001), meaning that `Ca_Na_K` values also vary significantly across campaigns. | |
Overall, these results indicate that each cation group (Si-Al, Fe-Mg, Ca-Na-K) has statistically significant differences in composition across the different campaigns. | |
### Tukey Post-hoc Tests | |
1. **Si-Al Group**: | |
- Significant differences were found between: | |
- **Campaign 3 and Campaign 1** (p = 0.0000191): Campaign 3 has lower `Si_Al` values than Campaign 1. | |
- **Campaign 3 and Campaign 2** (p < 0.0001): Campaign 3 has lower `Si_Al` values than Campaign 2. | |
- No significant difference was found between Campaigns 1 and 2. | |
2. **Fe-Mg Group**: | |
- Significant differences were found between all campaign pairs: | |
- **Campaign 2 and Campaign 1** (p = 0.0017): Campaign 2 has higher `Fe_Mg` values than Campaign 1. | |
- **Campaign 3 and Campaign 1** (p < 0.0001): Campaign 3 has much higher `Fe_Mg` values than Campaign 1. | |
- **Campaign 3 and Campaign 2** (p < 0.0001): Campaign 3 also has higher `Fe_Mg` values than Campaign 2. | |
3. **Ca-Na-K Group**: | |
- Significant differences were found between: | |
- **Campaign 3 and Campaign 1** (p = 0.0001): Campaign 3 has lower `Ca_Na_K` values than Campaign 1. | |
- **Campaign 3 and Campaign 2** (p < 0.0001): Campaign 3 has lower `Ca_Na_K` values than Campaign 2. | |
- No significant difference was found between Campaigns 1 and 2. | |
```{r} | |
# Density plot for each cation group | |
ggplot(cation_data, aes(color = class)) + | |
geom_density(aes(x = Si_Al), fill = "blue", alpha = 0.3) + | |
geom_density(aes(x = Fe_Mg), fill = "green", alpha = 0.3) + | |
geom_density(aes(x = Ca_Na_K), fill = "orange", alpha = 0.3) + | |
labs(title = "Density Plot of Cation Groups for PIXL Data", | |
x = "Cation Group Concentrations", | |
color = "Composition Class") + | |
theme_minimal() | |
# Load necessary libraries | |
library(ggplot2) | |
library(gridExtra) | |
# Box plot for Si_Al by campaign | |
plot_Si_Al <- ggplot(cation_data, aes(x = campaign, y = Si_Al, fill = campaign)) + | |
geom_boxplot() + | |
labs(title = "Si_Al Distribution Across Campaigns", | |
x = "Campaign", | |
y = "Si + Al Concentration") + | |
theme_minimal() + | |
theme(legend.position = "none") | |
# Box plot for Fe_Mg by campaign | |
plot_Fe_Mg <- ggplot(cation_data, aes(x = campaign, y = Fe_Mg, fill = campaign)) + | |
geom_boxplot() + | |
labs(title = "Fe_Mg Distribution Across Campaigns", | |
x = "Campaign", | |
y = "Fe + Mg Concentration") + | |
theme_minimal() + | |
theme(legend.position = "none") | |
# Box plot for Ca_Na_K by campaign | |
plot_Ca_Na_K <- ggplot(cation_data, aes(x = campaign, y = Ca_Na_K, fill = campaign)) + | |
geom_boxplot() + | |
labs(title = "Ca_Na_K Distribution Across Campaigns", | |
x = "Campaign", | |
y = "Ca + Na + K Concentration") + | |
theme_minimal() + | |
theme(legend.position = "none") | |
# Arrange the plots in a single layout | |
grid.arrange(plot_Si_Al, plot_Fe_Mg, plot_Ca_Na_K, nrow = 1) | |
``` | |
1. **Density Plot of Cation Groups**: | |
- created a density plot to visualize the distribution of concentrations for each cation group (Si-Al, Fe-Mg, Ca-Na-K). | |
- Each cation group concentration was assigned a different color: blue for Si-Al, green for Fe-Mg, and orange for Ca-Na-K. | |
- The densities were overlaid with transparency (alpha = 0.3) to allow for easy comparison across groups. | |
2. **Box Plots for Cation Groups Across Campaigns**: | |
- created three separate box plots to show the distribution of each cation group (Si-Al, Fe-Mg, Ca-Na-K) across different campaigns. | |
- Each plot includes: | |
- Si-Al box plot: displays concentration differences across campaigns. | |
- Fe-Mg box plot: displays Fe and Mg concentration across campaigns. | |
- Ca-Na-K box plot: displays Ca, Na, and K concentration across campaigns. | |
- The plots were arranged in a single row using `grid.arrange` for easy comparison. | |
### Analysis | |
1. **Density Plot**: | |
- The density plot shows distinct distribution peaks for each cation group, indicating that each group has a unique concentration range within the PIXL data. | |
- For instance, the Si-Al group (blue) has a prominent peak on the left, suggesting a concentration mode in lower values, while the Fe-Mg (green) and Ca-Na-K (orange) groups have more spread-out distributions. | |
- Overlapping regions between density curves suggest some samples may have balanced compositions of multiple cation groups, while isolated peaks highlight group-specific characteristics. | |
2. **Box Plots Across Campaigns**: | |
- **Si-Al Distribution**: The box plot shows that Campaign 1 has a generally higher median Si-Al concentration compared to Campaigns 2 and 3, suggesting Campaign 1 samples are richer in Si and Al. | |
- **Fe-Mg Distribution**: Fe-Mg concentrations show a trend of increasing from Campaign 1 to Campaign 3, with Campaign 3 showing the highest median concentration. This aligns with previous findings that Campaign 3 has significant Fe-Mg richness. | |
- **Ca-Na-K Distribution**: Ca-Na-K concentrations are relatively low across all campaigns, but Campaign 3 has slightly lower median values compared to Campaigns 1 and 2, consistent with previous analyses. | |
```{r} | |
# Filter data for two specific campaigns and remove NA values | |
campaign_a_data <- na.omit(subset(cation_data, campaign == "A")$Si_Al) | |
campaign_b_data <- na.omit(subset(cation_data, campaign == "B")$Si_Al) | |
# Check if both campaigns have enough data points | |
if (length(campaign_a_data) > 1 & length(campaign_b_data) > 1) { | |
# Perform Mann-Whitney test | |
mann_whitney_test <- wilcox.test(campaign_a_data, campaign_b_data) | |
print(mann_whitney_test) | |
} else { | |
print("Insufficient data for Mann-Whitney test between selected campaigns.") | |
} | |
``` | |
```{r} | |
# install.packages("dunn.test") | |
library(dunn.test) | |
# Perform Dunn's test for each cation group | |
# Example for Si_Al across campaigns | |
dunn_test_Si_Al <- dunn.test(cation_data$Si_Al, cation_data$campaign, method = "bonferroni") | |
print(dunn_test_Si_Al) | |
# Repeat for other cation groups | |
dunn_test_Fe_Mg <- dunn.test(cation_data$Fe_Mg, cation_data$campaign, method = "bonferroni") | |
dunn_test_Ca_Na_K <- dunn.test(cation_data$Ca_Na_K, cation_data$campaign, method = "bonferroni") | |
# Print the results | |
print(dunn_test_Fe_Mg) | |
print(dunn_test_Ca_Na_K) | |
``` | |
```{r} | |
# Hypothetical mean values for each cation group to test against | |
test_value_Si_Al <- 10 | |
test_value_Fe_Mg <- 10 | |
test_value_Ca_Na_K <- 10 | |
# Single-sample t-test for Si_Al | |
t_test_Si_Al <- t.test(cation_data$Si_Al, mu = test_value_Si_Al) | |
print(t_test_Si_Al) | |
# Single-sample t-test for Fe_Mg | |
t_test_Fe_Mg <- t.test(cation_data$Fe_Mg, mu = test_value_Fe_Mg) | |
print(t_test_Fe_Mg) | |
# Single-sample t-test for Ca_Na_K | |
t_test_Ca_Na_K <- t.test(cation_data$Ca_Na_K, mu = test_value_Ca_Na_K) | |
print(t_test_Ca_Na_K) | |
``` | |
### Kruskal-Wallis Test Results | |
The Kruskal-Wallis test was performed for each cation group (Si-Al, Fe-Mg, Ca-Na-K) across campaigns. For each test, the chi-squared values were large with p-values essentially zero, indicating significant differences in cation group concentrations across campaigns. | |
### Dunn’s Test (Post-hoc Analysis) | |
Since the Kruskal-Wallis test showed significant differences, Dunn’s test was applied to perform pairwise comparisons between campaigns for each cation group with Bonferroni correction: | |
1. **Si-Al Group**: | |
- **Significant Differences**: | |
- Campaign 3 vs. Campaign 1: \(p = 7.85 \times 10^{-5}\) (significant) | |
- Campaign 3 vs. Campaign 2: \(p = 1.28 \times 10^{-13}\) (significant) | |
- **Non-significant Difference**: | |
- Campaign 1 vs. Campaign 2: \(p = 0.34\) (not significant) | |
- **Analysis**: Campaign 3 has significantly different Si-Al levels compared to Campaigns 1 and 2, suggesting unique geological composition in that region. | |
2. **Fe-Mg Group**: | |
- **Significant Differences**: | |
- Campaign 2 vs. Campaign 1: \(p = 0.0004\) | |
- Campaign 3 vs. Campaign 1: \(p < 0.0001\) | |
- Campaign 3 vs. Campaign 2: \(p < 0.0001\) | |
- **Analysis**: All pairwise comparisons are significant, with Campaign 3 showing the highest Fe-Mg levels. This points to distinct Fe-Mg enrichment in Campaign 3 samples. | |
3. **Ca-Na-K Group**: | |
- **Significant Differences**: | |
- Campaign 1 vs. Campaign 3: \(p = 0.0054\) | |
- Campaign 2 vs. Campaign 3: \(p < 0.0001\) | |
- **Non-significant Difference**: | |
- Campaign 1 vs. Campaign 2: \(p = 0.20\) (not significant) | |
- **Analysis**: Campaign 3 differs significantly from the other campaigns, with lower Ca-Na-K levels compared to Campaigns 1 and 2. | |
**Regression** | |
1. **Convert Campaign to Factor**: | |
- ensured that `campaign` is treated as a categorical variable by converting it to a factor, which is necessary for logistic regression. | |
2. **Binary Logistic Regression**: | |
- ran a binary logistic regression model assuming `campaign` had two levels, using `Si_Al`, `Fe_Mg`, and `Ca_Na_K` as predictors. | |
- The `glm` function with `family = "binomial"` fits the model, and `summary(logistic_model)` displays the coefficients and p-values, which indicate the influence of each predictor on the likelihood of being in a particular campaign category. | |
3. **Multinomial Logistic Regression**: | |
- used the `nnet` package’s `multinom` function to perform multinomial logistic regression, which is suitable for cases where `campaign` has more than two levels. | |
- The `summary(multinom_model)` shows the estimated coefficients for each predictor, indicating how `Si_Al`, `Fe_Mg`, and `Ca_Na_K` concentrations influence the probability of each campaign classification. | |
4. **Predict Campaigns and Probabilities**: | |
- Using `predict(multinom_model, type = "class")`, you predicted the most likely campaign class for each observation. | |
- With `predict(multinom_model, type = "probs")`, you retrieved the predicted probabilities for each campaign, showing the likelihood of each sample belonging to each campaign. | |
- The `head(predicted_campaigns)` and `head(predicted_probabilities)` functions display the first few rows of these predictions. | |
```{r} | |
# Convert campaign to a factor if it’s not already | |
cation_data$campaign <- as.factor(cation_data$campaign) | |
# Run logistic regression (binary outcome assumed) | |
logistic_model <- glm(campaign ~ Si_Al + Fe_Mg + Ca_Na_K, data = cation_data, family = "binomial") | |
summary(logistic_model) | |
# Install the nnet package if not already installed | |
# install.packages("nnet") | |
library(nnet) | |
# Run multinomial logistic regression | |
multinom_model <- multinom(campaign ~ Si_Al + Fe_Mg + Ca_Na_K, data = cation_data) | |
summary(multinom_model) | |
# Predict probabilities for each campaign | |
predicted_campaigns <- predict(multinom_model, type = "class") | |
predicted_probabilities <- predict(multinom_model, type = "probs") | |
# View the predictions | |
head(predicted_campaigns) | |
head(predicted_probabilities) | |
``` | |
### Binary Logistic Regression | |
The binary logistic regression was run to see how `Si_Al`, `Fe_Mg`, and `Ca_Na_K` influence campaign classification (assuming a binary outcome): | |
- **Intercept**: Significant with a p-value of 0.0106, suggesting a baseline effect when all predictors are zero. | |
- **Si_Al**: Not significant (p = 0.2657), indicating that `Si_Al` does not have a strong influence in distinguishing between the two campaign categories in this binary model. | |
- **Fe_Mg**: Significant (p = 0.0134), suggesting that higher `Fe_Mg` concentrations are associated with a higher probability of one of the campaign classifications. | |
- **Ca_Na_K**: Not significant (p = 0.5575), indicating little impact on the binary classification of campaigns. | |
**Interpretation**: In this binary logistic model, only `Fe_Mg` shows a significant effect, which suggests it may be a key differentiator between the two assumed campaign levels. | |
### Multinomial Logistic Regression | |
The multinomial logistic regression was conducted to model campaign classification as a multi-level factor: | |
- **Campaign 2 (vs. Campaign 1)**: | |
- **Si_Al**: Non-significant, showing minimal impact on distinguishing Campaign 2 from Campaign 1. | |
- **Fe_Mg**: Positive coefficient (0.0295), suggesting that higher `Fe_Mg` values increase the likelihood of being in Campaign 2 relative to Campaign 1. | |
- **Ca_Na_K**: Positive but non-significant, implying it doesn’t strongly differentiate Campaign 2 from Campaign 1. | |
- **Campaign 3 (vs. Campaign 1)**: | |
- **Si_Al**: Negative coefficient (-0.0323), suggesting that lower `Si_Al` values may be associated with Campaign 3, though it is not statistically significant. | |
- **Fe_Mg**: Positive coefficient (0.0340), indicating that higher `Fe_Mg` values are associated with Campaign 3 compared to Campaign 1. | |
- **Ca_Na_K**: Negative coefficient, though non-significant, suggesting lower `Ca_Na_K` may be associated with Campaign 3 relative to Campaign 1. | |
### Predicted Probabilities | |
The predicted probabilities show the likelihood of each sample belonging to each campaign based on `Si_Al`, `Fe_Mg`, and `Ca_Na_K`. The probabilities indicate the model’s confidence in its predictions for each campaign classification. | |
**LIBS Data** | |
1. **Load and Prepare LIBS Data**: | |
- Loaded the `supercam_libs_moc_loc.Rds` file and converted it into a data frame. | |
- Ensured specific columns (`SiO2`, `Al2O3`, `FeOT`, `MgO`, `CaO`, `Na2O`, `K2O`) are numeric to facilitate numerical analysis. | |
2. **Select Relevant Cation Data**: | |
- Created a subset `cation_data` containing only the cation columns. | |
3. **Calculate Cation Group Sums**: | |
- Calculated the sums of certain cation groups: | |
- `Si_Al` (SiO₂ + Al₂O₃) | |
- `Fe_Mg` (FeO-T + MgO) | |
- `Ca_Na_K` (CaO + Na₂O + K₂O) | |
4. **Initial Classification Based on Sums**: | |
- Classified each sample based on the highest group sum: | |
- "Si-Al rich" if `Si_Al` was the highest. | |
- "Fe-Mg rich" if `Fe_Mg` was the highest. | |
- "Ca-Na-K rich" if `Ca_Na_K` was the highest. | |
- Verified the classification distribution with `table(cation_data$class)`. | |
```{r} | |
# Load the LIBS data | |
libs_data <- readRDS("/academics/MATP-4910-F24/DAR-Mars-F24/Data/supercam_libs_moc_loc.Rds") | |
# Convert the result back to a data frame (instead of a matrix) | |
libs_data <- as.data.frame(libs_data) | |
# Ensure all relevant columns are numeric by converting them | |
cols_to_convert <- c("SiO2", "Al2O3", "FeOT", "MgO", "CaO", "Na2O", "K2O") | |
libs_data[cols_to_convert] <- lapply(libs_data[cols_to_convert], as.numeric) | |
# Review the structure to ensure columns are now numeric | |
str(libs_data) | |
# Select valid columns for cation analysis | |
cation_data <- libs_data[, cols_to_convert] | |
# Inspect the first few rows to ensure data is selected correctly | |
head(cation_data) | |
# Calculate the cation group sums | |
cation_data$Si_Al <- cation_data$SiO2 + cation_data$Al2O3 | |
cation_data$Fe_Mg <- cation_data$FeOT + cation_data$MgO | |
cation_data$Ca_Na_K <- cation_data$CaO + cation_data$Na2O + cation_data$K2O | |
# Set thresholds for classification | |
cation_data$class <- ifelse(cation_data$Si_Al > cation_data$Fe_Mg & | |
cation_data$Si_Al > cation_data$Ca_Na_K, | |
"Si-Al rich", | |
ifelse(cation_data$Fe_Mg > cation_data$Ca_Na_K, | |
"Fe-Mg rich", | |
"Ca-Na-K rich")) | |
# Check the classifications | |
table(cation_data$class) | |
# Create quantiles for each group (Si_Al, Fe_Mg, Ca_Na_K) | |
library(dplyr) | |
cation_data$Si_Al_rank <- ntile(cation_data$Si_Al, 3) # Divides Si_Al into 3 quantiles | |
cation_data$Fe_Mg_rank <- ntile(cation_data$Fe_Mg, 3) # Divides Fe_Mg into 3 quantiles | |
cation_data$Ca_Na_K_rank <- ntile(cation_data$Ca_Na_K, 3) # Divides Ca_Na_K into 3 quantiles | |
# Now classify based on which group has the highest rank | |
cation_data$class <- ifelse(cation_data$Si_Al_rank >= | |
cation_data$Fe_Mg_rank & | |
cation_data$Si_Al_rank >= cation_data$Ca_Na_K_rank, | |
"Si-Al rich", | |
ifelse(cation_data$Fe_Mg_rank >= cation_data$Ca_Na_K_rank, | |
"Fe-Mg rich", | |
"Ca-Na-K rich")) | |
# Check the updated classification distribution | |
table(cation_data$class) | |
# Prepare the data for ternary plot | |
# Ensure the three components are in proportions or standardized | |
cation_data$total <- cation_data$Si_Al + cation_data$Fe_Mg + cation_data$Ca_Na_K | |
cation_data$Si_Al_prop <- cation_data$Si_Al / cation_data$total | |
cation_data$Fe_Mg_prop <- cation_data$Fe_Mg / cation_data$total | |
cation_data$Ca_Na_K_prop <- cation_data$Ca_Na_K / cation_data$total | |
# Check the structure of the final data | |
str(cation_data) | |
# Create the ternary plot | |
ggtern(data = cation_data, aes(x = Si_Al_prop, y = Fe_Mg_prop, z = Ca_Na_K_prop, color = class)) + | |
geom_point(size = 3, alpha = 0.7) + | |
scale_color_manual(values = | |
c("Si-Al rich" = "blue", "Fe-Mg rich" = "green", "Ca-Na-K rich" = "orange")) + | |
labs(title = "Ternary Plot of Si+Al, Fe+Mg, and Ca+Na+K for LIBS Data", | |
x = "Si + Al", | |
y = "Fe + Mg", | |
z = "Ca + Na + K", | |
color = "Rock Type") + | |
theme_minimal() | |
``` | |
```{r} | |
# Adjust the campaign ranges based on the actual sol values | |
cation_data$campaign <- ifelse(libs_data$sol < 100, "Campaign 1", | |
ifelse(libs_data$sol < 500, "Campaign 2", "Campaign 3")) | |
# Convert the 'campaign' column to a factor | |
cation_data$campaign <- as.factor(cation_data$campaign) | |
# Check the distribution of the campaign column | |
table(cation_data$campaign) | |
# Perform ANOVA for each cation group across campaigns | |
anova_Si_Al <- aov(Si_Al ~ campaign, data = cation_data) | |
anova_Fe_Mg <- aov(Fe_Mg ~ campaign, data = cation_data) | |
anova_Ca_Na_K <- aov(Ca_Na_K ~ campaign, data = cation_data) | |
# Summary of ANOVA results | |
summary(anova_Si_Al) | |
summary(anova_Fe_Mg) | |
summary(anova_Ca_Na_K) | |
# If significant, run a post-hoc Tukey test to determine where the differences lie | |
TukeyHSD(anova_Si_Al) | |
TukeyHSD(anova_Fe_Mg) | |
TukeyHSD(anova_Ca_Na_K) | |
``` | |
### Campaign Distribution | |
- The data has been divided into three campaigns based on the sol values: | |
- **Campaign 1**: Sol < 100 | |
- **Campaign 2**: 100 <= Sol < 500 | |
- **Campaign 3**: Sol >= 500 | |
- Distribution of samples by campaign: | |
- Campaign 1: 70 samples | |
- Campaign 2: 804 samples | |
- Campaign 3: 1058 samples | |
### ANOVA Results | |
ANOVA tests were conducted to determine if there are significant differences in `Si_Al`, `Fe_Mg`, and `Ca_Na_K` concentrations across the three campaigns. | |
1. **Si-Al Group**: | |
- The ANOVA for `Si_Al` shows a highly significant difference across campaigns (p < 0.001). | |
- **Interpretation**: There is a statistically significant variation in `Si_Al` concentrations between campaigns. | |
2. **Fe-Mg Group**: | |
- The ANOVA for `Fe_Mg` is also highly significant (p < 0.001). | |
- **Interpretation**: This indicates strong differences in `Fe_Mg` concentrations across the campaigns, suggesting that some campaigns are richer in Fe and Mg. | |
3. **Ca-Na-K Group**: | |
- The ANOVA for `Ca_Na_K` is significant as well (p < 0.001). | |
- **Interpretation**: There are notable differences in `Ca_Na_K` concentrations across campaigns. | |
### Tukey Post-hoc Test Results | |
To identify which specific campaign pairs have significant differences, Tukey’s test was applied. | |
1. **Si-Al Group**: | |
- **Campaign 3 vs. Campaign 1**: Significant difference (p = 0.0000191), with Campaign 3 having lower `Si_Al` concentrations. | |
- **Campaign 3 vs. Campaign 2**: Highly significant (p < 0.0001), with Campaign 3 showing lower `Si_Al` than Campaign 2. | |
- **Campaign 2 vs. Campaign 1**: No significant difference. | |
- **Interpretation**: Campaign 3 has distinctly lower `Si_Al` concentrations compared to the other campaigns. | |
2. **Fe-Mg Group**: | |
- **Campaign 2 vs. Campaign 1**: Significant (p = 0.0017), with Campaign 2 having higher `Fe_Mg` concentrations. | |
- **Campaign 3 vs. Campaign 1**: Highly significant (p < 0.0001), with Campaign 3 showing much higher `Fe_Mg` than Campaign 1. | |
- **Campaign 3 vs. Campaign 2**: Highly significant (p < 0.0001), with Campaign 3 also having higher `Fe_Mg` than Campaign 2. | |
- **Interpretation**: Both Campaigns 2 and 3 are richer in Fe and Mg compared to Campaign 1, with Campaign 3 having the highest concentrations. | |
3. **Ca-Na-K Group**: | |
- **Campaign 3 vs. Campaign 1**: Significant (p = 0.0001), with Campaign 3 having lower `Ca_Na_K` concentrations. | |
- **Campaign 3 vs. Campaign 2**: Highly significant (p < 0.0001), with Campaign 3 showing lower `Ca_Na_K` than Campaign 2. | |
- **Campaign 2 vs. Campaign 1**: No significant difference. | |
- **Interpretation**: Campaign 3 has lower `Ca_Na_K` concentrations compared to Campaigns 1 and 2, which do not differ significantly from each other. | |
```{r} | |
# Load necessary library | |
library(dplyr) | |
# Calculate the count and proportion for each campaign and composition class | |
campaign_composition_summary <- cation_data %>% | |
group_by(campaign, class) %>% | |
summarize(count = n(), .groups = 'drop') %>% | |
group_by(campaign) %>% | |
mutate(proportion = count / sum(count)) %>% | |
ungroup() | |
# Display the results | |
print(campaign_composition_summary) | |
``` | |
```{r} | |
# Combined density plot with facets for each cation group | |
cation_data_long <- cation_data %>% | |
tidyr::pivot_longer(cols = c(Si_Al, Fe_Mg, Ca_Na_K), | |
names_to = "cation_group", | |
values_to = "concentration") | |
ggplot(cation_data_long, aes(x = concentration, fill = campaign)) + | |
geom_density(alpha = 0.4) + | |
facet_wrap(~ cation_group, scales = "free") + | |
labs(title = "Density Plots of Cation Groups Across Campaigns", | |
x = "Concentration", | |
y = "Density", | |
fill = "Campaign") + | |
theme_minimal() | |
# Box plot for Si_Al by campaign | |
ggplot(cation_data, aes(x = campaign, y = Si_Al, fill = campaign)) + | |
geom_boxplot() + | |
labs(title = "Box Plot of Si_Al Across Campaigns", | |
x = "Campaign", | |
y = "Si + Al Concentration") + | |
theme_minimal() | |
# Box plot for Fe_Mg by campaign | |
ggplot(cation_data, aes(x = campaign, y = Fe_Mg, fill = campaign)) + | |
geom_boxplot() + | |
labs(title = "Box Plot of Fe_Mg Across Campaigns", | |
x = "Campaign", | |
y = "Fe + Mg Concentration") + | |
theme_minimal() | |
# Box plot for Ca_Na_K by campaign | |
ggplot(cation_data, aes(x = campaign, y = Ca_Na_K, fill = campaign)) + | |
geom_boxplot() + | |
labs(title = "Box Plot of Ca_Na_K Across Campaigns", | |
x = "Campaign", | |
y = "Ca + Na + K Concentration") + | |
theme_minimal() | |
``` | |
### Density Plots for Cation Groups Across Campaigns | |
1. **Si_Al**: | |
- Campaign 1 (red) shows a distinct peak at a slightly higher concentration compared to Campaigns 2 and 3, indicating higher `Si_Al` concentrations. | |
- Campaigns 2 (green) and 3 (blue) have similar peak densities, but Campaign 3 has a broader distribution, suggesting more variation in `Si_Al` concentrations within that campaign. | |
2. **Fe_Mg**: | |
- Campaign 1 has lower `Fe_Mg` concentrations, as shown by its peak at a lower concentration range. | |
- Campaign 3 shows a shift towards higher concentrations with a broad distribution, while Campaign 2 lies between Campaigns 1 and 3. | |
- This aligns with earlier findings that Campaign 3 is richer in `Fe_Mg`. | |
3. **Ca_Na_K**: | |
- Campaigns 1 and 2 have similar distributions for `Ca_Na_K`, peaking at lower concentration values. | |
- Campaign 3 shows a slight peak shift toward lower concentrations compared to Campaigns 1 and 2, indicating lower `Ca_Na_K` concentrations in Campaign 3. | |
### Box Plots for Each Cation Group Across Campaigns | |
1. **Si_Al Box Plot**: | |
- Campaign 1 has a higher median `Si_Al` concentration than Campaigns 2 and 3, with a slightly wider interquartile range (IQR). | |
- Campaign 3 has the lowest median `Si_Al` concentration, with more outliers below the median, suggesting a distinct trend toward lower `Si_Al` values in that campaign. | |
2. **Fe_Mg Box Plot**: | |
- There is a noticeable increase in median `Fe_Mg` concentration from Campaign 1 to Campaign 3. | |
- Campaign 3 has a higher median and a wider IQR, indicating greater variation and a tendency toward higher `Fe_Mg` values, consistent with its Fe-Mg richness. | |
3. **Ca_Na_K Box Plot**: | |
- Campaign 1 and Campaign 2 have similar medians, but Campaign 3 shows a lower median and a slight downward shift in values. | |
- Campaign 3 has fewer high-concentration outliers, indicating a more consistent trend toward lower `Ca_Na_K` concentrations in that campaign. | |
```{r} | |
# Function to perform Mann-Whitney test for two campaigns for a specified column | |
perform_mann_whitney <- function(campaign1, campaign2, data, column) { | |
data1 <- subset(data, campaign == campaign1)[[column]] | |
data2 <- subset(data, campaign == campaign2)[[column]] | |
test_result <- wilcox.test(data1, data2) | |
return(list( | |
campaign1 = campaign1, | |
campaign2 = campaign2, | |
column = column, | |
p_value = test_result$p.value, | |
statistic = test_result$statistic | |
)) | |
} | |
# Define campaigns and columns | |
campaigns <- unique(cation_data$campaign) | |
columns <- c("Si_Al", "Fe_Mg", "Ca_Na_K") | |
# Initialize list to store results | |
results <- list() | |
# Loop through each combination of campaigns and each column | |
for (col in columns) { | |
for (i in 1:(length(campaigns) - 1)) { | |
for (j in (i + 1):length(campaigns)) { | |
result <- perform_mann_whitney(campaigns[i], campaigns[j], cation_data, col) | |
results <- append(results, list(result)) | |
} | |
} | |
} | |
# Convert results to a data frame for easy viewing | |
results_df <- do.call(rbind, lapply(results, as.data.frame)) | |
# Display the results | |
print(results_df) | |
``` | |
### Si_Al Comparison | |
- **Campaign 1 vs. Campaign 2**: p-value = 0.1473 (not significant) | |
- No significant difference in `Si_Al` concentrations between Campaigns 1 and 2. | |
- **Campaign 1 vs. Campaign 3**: p-value = 0.0001526 (significant) | |
- Significant difference, indicating that `Si_Al` concentrations differ between Campaigns 1 and 3. | |
- **Campaign 2 vs. Campaign 3**: p-value = 6.1659e-14 (highly significant) | |
- Strongly significant difference, suggesting that `Si_Al` levels are distinct between Campaigns 2 and 3. | |
### Fe_Mg Comparison | |
- **Campaign 1 vs. Campaign 2**: p-value = 4.9397e-05 (significant) | |
- Significant difference, with Campaign 2 having different `Fe_Mg` concentrations compared to Campaign 1. | |
- **Campaign 1 vs. Campaign 3**: p-value = 7.5487e-13 (highly significant) | |
- Strongly significant difference, indicating substantial differences in `Fe_Mg` concentrations between Campaigns 1 and 3. | |
- **Campaign 2 vs. Campaign 3**: p-value = 3.9051e-24 (highly significant) | |
- Very strong significance, suggesting that `Fe_Mg` concentrations differ considerably between Campaigns 2 and 3. | |
### Ca_Na_K Comparison | |
- **Campaign 1 vs. Campaign 2**: p-value = 0.0037212 (significant) | |
- Significant difference, showing that `Ca_Na_K` concentrations between Campaigns 1 and 2 are different. | |
- **Campaign 1 vs. Campaign 3**: p-value = 3.3217e-13 (highly significant) | |
- Strong significance, indicating distinct `Ca_Na_K` levels between Campaigns 1 and 3. | |
- **Campaign 2 vs. Campaign 3**: p-value = 6.3642e-29 (extremely significant) | |
- Very strong significance, suggesting that Campaign 3 has different `Ca_Na_K` concentrations compared to Campaign 2. | |
```{r} | |
# Install dunn.test package if not already installed | |
# install.packages("dunn.test") | |
library(dunn.test) | |
# Perform Dunn's test for Si_Al across campaigns | |
dunn_test_Si_Al <- dunn.test(cation_data$Si_Al, cation_data$campaign, method = "bonferroni") | |
print(dunn_test_Si_Al) | |
# Perform Dunn's test for Fe_Mg across campaigns | |
dunn_test_Fe_Mg <- dunn.test(cation_data$Fe_Mg, cation_data$campaign, method = "bonferroni") | |
print(dunn_test_Fe_Mg) | |
# Perform Dunn's test for Ca_Na_K across campaigns | |
dunn_test_Ca_Na_K <- dunn.test(cation_data$Ca_Na_K, cation_data$campaign, method = "bonferroni") | |
print(dunn_test_Ca_Na_K) | |
``` | |
### Dunn's Test Results | |
#### 1. **Si_Al Group** | |
- **Campaign 1 vs. Campaign 2**: Not significant (p-adjusted = 0.3425). | |
- **Campaign 1 vs. Campaign 3**: Significant (p-adjusted = 0.0000785). | |
- **Campaign 2 vs. Campaign 3**: Highly significant (p-adjusted = 0.000000128). | |
- **Interpretation**: There is a significant difference in `Si_Al` concentrations between Campaigns 1 & 3 and Campaigns 2 & 3, but not between Campaigns 1 & 2. This aligns with previous findings, suggesting that `Si_Al` levels in Campaign 3 are distinct from the other two campaigns. | |
#### 2. **Fe_Mg Group** | |
- **Campaign 1 vs. Campaign 2**: Significant (p-adjusted = 0.0004). | |
- **Campaign 1 vs. Campaign 3**: Highly significant (p-adjusted < 0.0001). | |
- **Campaign 2 vs. Campaign 3**: Highly significant (p-adjusted < 0.0001). | |
- **Interpretation**: All comparisons are significant, indicating that `Fe_Mg` concentrations are distinct across each campaign. This suggests that each campaign area has unique `Fe_Mg` levels, with Campaign 3 having particularly high concentrations, as observed previously. | |
#### 3. **Ca_Na_K Group** | |
- **Campaign 1 vs. Campaign 2**: Significant (p-adjusted = 0.0054). | |
- **Campaign 1 vs. Campaign 3**: Highly significant (p-adjusted < 0.0001). | |
- **Campaign 2 vs. Campaign 3**: Highly significant (p-adjusted < 0.0001). | |
- **Interpretation**: There are significant differences in `Ca_Na_K` concentrations across all campaign pairs. Campaign 3 shows lower `Ca_Na_K` concentrations compared to Campaigns 1 and 2, making it distinct. | |
```{r} | |
# Specify the hypothetical mean for comparison | |
test_value <- 10 | |
# Single-sample t-test for Si_Al | |
t_test_Si_Al <- t.test(cation_data$Si_Al, mu = test_value) | |
print(t_test_Si_Al) | |
# Single-sample t-test for Fe_Mg | |
t_test_Fe_Mg <- t.test(cation_data$Fe_Mg, mu = test_value) | |
print(t_test_Fe_Mg) | |
# Single-sample t-test for Ca_Na_K | |
t_test_Ca_Na_K <- t.test(cation_data$Ca_Na_K, mu = test_value) | |
print(t_test_Ca_Na_K) | |
``` | |
**One sample t-test** | |
1. **Si_Al** | |
- **t-value**: 128.08 | |
- **Degrees of Freedom (df)**: 1931 | |
- **p-value**: < 2.2e-16 (highly significant) | |
- **95% Confidence Interval**: [49.10, 50.32] | |
- **Mean of `Si_Al`**: 49.71 | |
- **Interpretation**: The mean `Si_Al` concentration (49.71) is significantly higher than the hypothetical mean of 10. The extremely low p-value suggests a highly significant difference, meaning the `Si_Al` concentration is much higher than the test value. | |
2. **Fe_Mg** | |
- **t-value**: 64.92 | |
- **Degrees of Freedom (df)**: 1931 | |
- **p-value**: < 2.2e-16 (highly significant) | |
- **95% Confidence Interval**: [35.74, 37.35] | |
- **Mean of `Fe_Mg`**: 36.55 | |
- **Interpretation**: The mean `Fe_Mg` concentration (36.55) is also significantly higher than the hypothetical mean of 10. The low p-value indicates a highly significant difference, confirming that `Fe_Mg` levels are much higher than 10. | |
3. **Ca_Na_K** | |
- **t-value**: -20.63 | |
- **Degrees of Freedom (df)**: 1931 | |
- **p-value**: < 2.2e-16 (highly significant) | |
- **95% Confidence Interval**: [6.80, 7.35] | |
- **Mean of `Ca_Na_K`**: 7.08 | |
- **Interpretation**: The mean `Ca_Na_K` concentration (7.08) is significantly lower than the hypothetical mean of 10. The negative t-value and low p-value suggest a highly significant difference, showing that `Ca_Na_K` levels are below 10. | |
```{r} | |
# Check unique values in the campaign variable | |
unique(cation_data$campaign) | |
# Convert campaign to a factor if not already | |
cation_data$campaign <- as.factor(cation_data$campaign) | |
# Run binary logistic regression | |
logistic_model <- glm(campaign ~ Si_Al + Fe_Mg + Ca_Na_K, data = cation_data, family = "binomial") | |
summary(logistic_model) | |
# install.packages("nnet") | |
library(nnet) | |
# Multinomial logistic regression | |
multinom_model <- multinom(campaign ~ Si_Al + Fe_Mg + Ca_Na_K, data = cation_data) | |
summary(multinom_model) | |
# Predict campaign for the existing data (useful for evaluating the model) | |
predicted_campaigns <- predict(multinom_model, type = "class") | |
head(predicted_campaigns) | |
# If you want probabilities for each campaign | |
predicted_probabilities <- predict(multinom_model, type = "probs") | |
head(predicted_probabilities) | |
# Calculate accuracy | |
mean(predicted_campaigns == cation_data$campaign) | |
``` | |
### Binary Logistic Regression (glm) | |
Since the `campaign` variable has three levels (Campaign 1, Campaign 2, Campaign 3), the binary logistic regression might not be the best approach for this data, as it’s generally suited for two-level outcomes. However, here’s what we can interpret from the model: | |
- **Intercept**: The intercept has a significant positive coefficient (3.24192, p = 0.0106), which influences the baseline prediction. | |
- **Si_Al**: The coefficient for `Si_Al` is negative (-0.01566) but not statistically significant (p = 0.2657), suggesting that `Si_Al` concentration doesn’t strongly predict the campaign in a binary logistic context. | |
- **Fe_Mg**: The coefficient for `Fe_Mg` is positive (0.03231) and statistically significant (p = 0.0134), indicating that higher `Fe_Mg` concentrations are associated with a particular campaign (though the binary approach might not give us the complete picture). | |
- **Ca_Na_K**: The coefficient for `Ca_Na_K` is negative (-0.01654) and not significant (p = 0.5575), indicating that it may not strongly predict the campaign in a binary setup. | |
The model’s AIC (576.76) and residual deviance (568.76) indicate the model’s fit but might not be fully informative given the limitations of using binary logistic regression for a three-level outcome. | |
### Multinomial Logistic Regression (nnet::multinom) | |
The multinomial logistic regression is more appropriate for this dataset, as it allows for multiple outcome levels (Campaign 1, Campaign 2, Campaign 3). | |
#### Model Interpretation | |
- **Intercepts**: | |
- Campaign 2’s intercept (1.491249) and Campaign 3’s intercept (3.646404) show positive baseline influences for these campaigns relative to Campaign 1. | |
- **Si_Al**: | |
- The coefficient for `Si_Al` is near zero for both Campaign 2 (0.00024) and Campaign 3 (-0.03232) with small standard errors, suggesting `Si_Al` does not contribute strongly to distinguishing between campaigns in this model. | |
- **Fe_Mg**: | |
- The coefficient for `Fe_Mg` is positive for both Campaign 2 (0.02947) and Campaign 3 (0.03396), indicating that higher `Fe_Mg` values increase the likelihood of the sample belonging to Campaigns 2 and 3. | |
- **Ca_Na_K**: | |
- The coefficient for `Ca_Na_K` is positive for Campaign 2 (0.01098) but negative for Campaign 3 (-0.04825), suggesting that higher `Ca_Na_K` values slightly favor Campaign 2 over Campaign 3. | |
The model’s AIC (3000.732) provides a measure of fit, though it should be compared with other models for context. | |
### Predicted Campaigns and Accuracy | |
- **Predicted Campaigns**: The `predicted_campaigns` variable shows the campaign classifications based on the multinomial logistic model. | |
- **Predicted Probabilities**: The `predicted_probabilities` variable gives the probability of each campaign for each sample, indicating the confidence of predictions. | |
- **Accuracy**: The calculated accuracy of 60.97% suggests the model has moderate predictive power. This means the model’s predictors (`Si_Al`, `Fe_Mg`, and `Ca_Na_K`) partially explain the differences between campaigns, but there may be other influencing factors or non-linear relationships. | |