Ecological Diversity

Interactive alpha and beta diversity intuition from bioBakery outputs

This page shows a beginner-friendly way to calculate alpha diversity (within-sample diversity) and beta diversity (between-sample dissimilarity) from the tables produced by the bioBakery ecosystem.

The same workflow works for both:

MetaPhlAn taxonomic profiles if you want organism-level diversity
HUMAnN gene family or pathway tables if you want functional diversity

Note

For taxonomic diversity, most people start with a species-level MetaPhlAn table. For functional diversity, use an unstratified HUMAnN pathway or gene-family table.

Interactive ecological diversity explainer

The mini-lab below is interactive and intentionally synthetic: it is there to help you build intuition before you run the R code further down the page on your own tables.

What files to start from

A practical minimal input set is:

File	Typical source	What it contains
`metaphlan_merged.tsv`	`merge_metaphlan_tables.py`	Taxa x samples relative abundance matrix
`pathabundance.tsv` or joined HUMAnN table	`humann_join_tables`	Pathways or gene families x samples
`metadata.tsv`	Your study design	Sample IDs, group labels, time points, etc.

For diversity calculations you want a matrix with:

rows = samples
columns = taxa or pathways
values = non-negative abundances

R packages

install.packages(c("vegan", "dplyr", "tidyr", "tibble", "ggplot2", "Rtsne", "uwot"))

library(vegan)
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(Rtsne)
library(uwot)

Step 1: Read a MetaPhlAn table

A merged MetaPhlAn table usually has one taxon per row and one sample per column. The example below keeps only species-level rows.

metaphlan_raw <- read.delim("metaphlan_merged.tsv", check.names = FALSE, comment.char = "#")
metadata <- read.delim("metadata.tsv", check.names = FALSE)

species_mat <- metaphlan_raw |>
  rename(feature = clade_name) |>
  filter(grepl("\\|s__[^|]+$", feature)) |>
  column_to_rownames("feature") |>
  t() |>
  as.data.frame() |>
  rownames_to_column("sample_id") |>
  left_join(metadata, by = "sample_id")

sample_info <- species_mat |> select(sample_id, group, timepoint)
abund <- species_mat |> select(-sample_id, -group, -timepoint) |> as.matrix()
mode(abund) <- "numeric"
rownames(abund) <- sample_info$sample_id

# MetaPhlAn is already relative abundance, but re-scaling is a safe habit.
abund_rel <- decostand(abund, method = "total")

Tip

If you prefer family-level diversity, change the filter to grepl("\\|f__[^|]+$", feature) instead of species-level rows.

HUMAnN variant

For HUMAnN, use the same structure but usually remove stratified rows first so that each feature is a pathway or gene family rather than a pathway-with-species combination.

humann_raw <- read.delim("pathabundance.tsv", check.names = FALSE)

pathway_mat <- humann_raw |>
  rename(feature = `# Pathway`) |>
  filter(!grepl("\\|", feature)) |>
  column_to_rownames("feature") |>
  t() |>
  as.matrix()

mode(pathway_mat) <- "numeric"
pathway_rel <- decostand(pathway_mat, method = "total")

Alpha diversity

Alpha diversity asks: how diverse is each sample on its own?

The two basics: richness and evenness

Richness = how many taxa/features are detected
Evenness = how evenly abundance is spread across those taxa/features

In practice, beginners usually report richness and evenness alongside three very common alpha-diversity metrics:

Shannon diversity
Simpson diversity
Inverse Simpson diversity

When to use each metric

Metric	What it emphasizes	Intuition
Richness	Number of detected taxa	“How many things are here?”
Pielou evenness	Balance of abundances	“Are a few taxa dominating?”
Shannon	Both richness and evenness	Sensitive to common taxa, but still notices rare taxa
Simpson	Dominance	Drops when one or a few taxa take over
Inverse Simpson	Effective number of dominant taxa	Easier to explain to beginners than raw Simpson

Calculate alpha diversity with `vegan`

alpha_div <- tibble(
  sample_id = rownames(abund_rel),
  richness = specnumber(abund_rel),
  shannon = diversity(abund_rel, index = "shannon"),
  simpson = diversity(abund_rel, index = "simpson"),
  inverse_simpson = diversity(abund_rel, index = "invsimpson")
) |>
  mutate(
    pielou_evenness = if_else(richness > 1, shannon / log(richness), NA_real_)
  ) |>
  left_join(sample_info, by = "sample_id")

alpha_div

Quick interpretation guide

High richness + high evenness → many taxa and none dominates too strongly
High richness + low evenness → many taxa are present, but only a few are doing most of the work
Low Shannon → either few taxa, or strong dominance by one/few taxa
Low Simpson / inverse Simpson → especially strong dominance structure

Plot alpha diversity

ggplot(alpha_div, aes(x = group, y = shannon, fill = group)) +
  geom_boxplot(width = 0.7, alpha = 0.85, outlier.shape = NA) +
  geom_jitter(width = 0.12, size = 2, alpha = 0.8) +
  labs(
    title = "Shannon alpha diversity by group",
    x = NULL,
    y = "Shannon diversity"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

Statistical testing for alpha diversity

For two groups, a Wilcoxon rank-sum test is a common first pass. For more complex designs, use a linear model or mixed model.

wilcox.test(shannon ~ group, data = alpha_div)
wilcox.test(inverse_simpson ~ group, data = alpha_div)

Warning

Avoid over-interpreting tiny differences in alpha diversity. A statistically significant result can still be biologically small, especially when sequencing depth, filtering, or zero handling differs between groups.

Beta diversity

Beta diversity asks: how different are samples from each other?

The answer depends on the dissimilarity metric you choose.

Three very common beta-diversity metrics

Metric	`vegan` implementation	Best for	Notes
Bray-Curtis	`vegdist(..., method = "bray")`	Relative abundance data	Usually the first choice for MetaPhlAn/HUMAnN profiles
Jaccard	`vegdist(..., method = "jaccard")` after presence/absence conversion	Presence/absence questions	Ignores abundance magnitude
Aitchison	CLR transform + `dist()`	Compositional data	Very useful when you want a composition-aware analysis

Bray-Curtis dissimilarity

Bray-Curtis is popular because it reflects abundance shifts between samples.

bray_dist <- vegdist(abund_rel, method = "bray")

Jaccard dissimilarity

Jaccard is useful when you only care whether a taxon/pathway is present.

abund_pa <- decostand(abund_rel, method = "pa")
jaccard_dist <- vegdist(abund_pa, method = "jaccard")

Aitchison distance

Aitchison distance is built for compositional data. It requires a centered log-ratio (CLR) transform, which means adding a small pseudocount first.

abund_clr <- decostand(abund_rel + 1e-06, method = "clr")
aitchison_dist <- dist(abund_clr)

Tip

A simple beginner workflow is: Bray-Curtis for your main ecological analysis, then Aitchison as a sensitivity check if you want a composition-aware view.

Ordination and visualization

A distance matrix is hard to read directly, so you usually project it into 2 dimensions.

A useful plotting helper

plot_ordination <- function(df, x, y, title) {
  ggplot(df, aes(x = .data[[x]], y = .data[[y]], color = group)) +
    geom_point(size = 3, alpha = 0.9) +
    labs(title = title, x = x, y = y, color = "Group") +
    theme_minimal(base_size = 12)
}

PCA

PCA works on a transformed abundance matrix, not directly on a distance matrix. For ecological data, a Hellinger transform is a good beginner-friendly choice.

abund_hellinger <- decostand(abund_rel, method = "hellinger")
pca_fit <- prcomp(abund_hellinger, center = TRUE, scale. = FALSE)

pca_df <- as.data.frame(pca_fit$x[, 1:2]) |>
  rownames_to_column("sample_id") |>
  left_join(sample_info, by = "sample_id")

plot_ordination(pca_df, "PC1", "PC2", "PCA of Hellinger-transformed abundances")

PCoA

PCoA is a natural companion to Bray-Curtis because it starts from a dissimilarity matrix.

pcoa_fit <- cmdscale(bray_dist, eig = TRUE, k = 2)

pcoa_df <- as.data.frame(pcoa_fit$points) |>
  setNames(c("PCoA1", "PCoA2")) |>
  rownames_to_column("sample_id") |>
  left_join(sample_info, by = "sample_id")

plot_ordination(pcoa_df, "PCoA1", "PCoA2", "PCoA of Bray-Curtis distances")

NMDS

NMDS is one of the most widely used ordination methods in ecology because it focuses on preserving rank-order distances.

set.seed(123)
nmds_fit <- metaMDS(abund_rel, distance = "bray", k = 2, trymax = 100)

nmds_df <- as.data.frame(scores(nmds_fit, display = "sites")) |>
  rownames_to_column("sample_id") |>
  left_join(sample_info, by = "sample_id")

plot_ordination(nmds_df, "NMDS1", "NMDS2", "NMDS on Bray-Curtis distances")

t-SNE

t-SNE is mainly for visualization. It can reveal local neighborhood structure, but distances between distant clusters are not always easy to interpret biologically.

set.seed(123)
tsne_fit <- Rtsne(abund_hellinger, dims = 2, perplexity = 10, check_duplicates = FALSE)

tsne_df <- as.data.frame(tsne_fit$Y) |>
  setNames(c("tSNE1", "tSNE2")) |>
  mutate(sample_id = rownames(abund_hellinger)) |>
  left_join(sample_info, by = "sample_id")

plot_ordination(tsne_df, "tSNE1", "tSNE2", "t-SNE of Hellinger-transformed abundances")

UMAP

UMAP is another visualization-first method that often gives cleaner-looking clusters than t-SNE while preserving more global structure.

set.seed(123)
umap_fit <- uwot::umap(abund_hellinger, n_neighbors = 15, min_dist = 0.2)

umap_df <- as.data.frame(umap_fit) |>
  setNames(c("UMAP1", "UMAP2")) |>
  mutate(sample_id = rownames(abund_hellinger)) |>
  left_join(sample_info, by = "sample_id")

plot_ordination(umap_df, "UMAP1", "UMAP2", "UMAP of Hellinger-transformed abundances")

Which ordination should you show?

A sensible beginner combination is:

PCoA + Bray-Curtis for the main ecological result
NMDS + Bray-Curtis as a robust ecological cross-check
PCA if you want a linear view of transformed abundances
t-SNE / UMAP as exploratory visualizations, not your only evidence

Statistical testing for beta diversity

Visualization is helpful, but it is not the statistical test.

PERMANOVA

Use adonis2() to test whether groups differ in their community composition.

adonis2(bray_dist ~ group, data = sample_info, permutations = 999)

Check dispersion before over-interpreting PERMANOVA

PERMANOVA can be significant either because group centroids differ or because one group is much more dispersed.

bray_dispersion <- betadisper(bray_dist, sample_info$group)
anova(bray_dispersion)
permutest(bray_dispersion)

Important

A common reporting pattern is: Bray-Curtis PCoA plot + PERMANOVA p-value + dispersion check.

Recommended beginner workflow

Start with a species-level MetaPhlAn or unstratified HUMAnN table.
Re-scale to proportions with decostand(..., "total").
Calculate richness, Shannon, Simpson, inverse Simpson, and Pielou evenness.
Use Bray-Curtis as your main beta-diversity metric.
Visualize beta diversity with PCoA and NMDS.
Use PERMANOVA to test group separation.
Re-run beta diversity with Aitchison distance if you want a composition-aware sensitivity analysis.

--- title: "Ecological Diversity" subtitle: "Interactive alpha and beta diversity intuition from bioBakery outputs" --- This page shows a beginner-friendly way to calculate **alpha diversity** (within-sample diversity) and **beta diversity** (between-sample dissimilarity) from the tables produced by the bioBakery ecosystem. The same workflow works for both: - **MetaPhlAn** taxonomic profiles if you want organism-level diversity - **HUMAnN** gene family or pathway tables if you want functional diversity ::: {.callout-note} For taxonomic diversity, most people start with a **species-level MetaPhlAn table**. For functional diversity, use an **unstratified HUMAnN pathway or gene-family table**. ::: ## Interactive ecological diversity explainer The mini-lab below is **interactive and intentionally synthetic**: it is there to help you build intuition before you run the R code further down the page on your own tables. ::: {=html} <style> #ecological-diversity-lab { width: min(100%, 1100px); margin: 1.5rem auto 2rem; } .eco-baseline-bridge { border: 1px solid #d8dee4; border-radius: 0.9rem; padding: 1rem 1.1rem 1.15rem; background: linear-gradient(180deg, #ffffff 0%, #f8fafc 100%); box-shadow: 0 10px 24px rgba(15, 23, 42, 0.06); margin-bottom: 1.25rem; } .eco-baseline-bridge h3 { margin-top: 0; margin-bottom: 0.45rem; } .eco-diversity-grid { display: grid; gap: 1.25rem; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); } .eco-card { border: 1px solid #d8dee4; border-radius: 0.9rem; padding: 1rem 1.1rem 1.2rem; background: #fff; box-shadow: 0 10px 24px rgba(15, 23, 42, 0.06); min-width: 0; } .eco-card h3 { margin-top: 0; margin-bottom: 0.5rem; } .eco-control-grid { display: grid; gap: 0.9rem; margin: 1rem 0; } .eco-control-grid label, .eco-select-label { display: grid; gap: 0.35rem; font-weight: 600; } .eco-control-grid input[type="range"] { width: 100%; } .eco-control-value { font-size: 0.82rem; color: #475569; font-weight: 500; } .eco-select-label select { max-width: 13rem; } .eco-metric-grid { display: grid; gap: 0.75rem; grid-template-columns: repeat(auto-fit, minmax(110px, 1fr)); margin-top: 1rem; } .eco-metric-card { border-radius: 0.8rem; background: #f8fafc; padding: 0.75rem 0.85rem; border: 1px solid #e2e8f0; } .eco-metric-label { display: block; font-size: 0.8rem; color: #475569; } .eco-metric-value { display: block; font-size: 1.15rem; font-weight: 700; color: #0f172a; } .eco-chart-frame { border: 1px solid #e2e8f0; border-radius: 0.8rem; padding: 0.75rem; background: linear-gradient(180deg, #ffffff 0%, #f8fafc 100%); max-width: 32rem; margin: 0 auto; } #alpha-composition-chart svg, #eco-baseline-chart svg, #beta-ordination-chart svg { display: block; width: 100%; height: auto; max-width: 100%; margin: 0 auto; } .eco-beta-summary { margin: 0.8rem 0 0; color: #334155; } .eco-inline-legend { display: flex; flex-wrap: wrap; gap: 0.8rem 1rem; margin-top: 0.75rem; color: #475569; font-size: 0.9rem; } .eco-legend-item { display: inline-flex; align-items: center; gap: 0.45rem; } .eco-legend-swatch { width: 0.85rem; height: 0.85rem; display: inline-block; border: 2px solid currentColor; background: currentColor; } .eco-legend-swatch--baseline { color: #64748b; border-radius: 0.12rem; transform: rotate(45deg); } .eco-legend-swatch--control { color: #1976d2; border-radius: 999px; } .eco-legend-swatch--intervention { color: #c62828; border-radius: 0.15rem; } .eco-caption { font-size: 0.9rem; color: #475569; } </style> <div id="ecological-diversity-lab"></div> <script> document.addEventListener("DOMContentLoaded", function () { const diversityLab = document.getElementById("ecological-diversity-lab"); diversityLab.innerHTML = ` <section class="eco-baseline-bridge" aria-labelledby="eco-baseline-heading"> <h3 id="eco-baseline-heading">Linked alpha-beta source sample</h3> <p class="eco-caption">The alpha controls now build the exact source sample used by the beta-diversity panel. In return, the beta shift control updates the intervention-like preview back inside the alpha panel, so both graphics stay linked instead of behaving like separate demos.</p> <div id="eco-baseline-chart" class="eco-chart-frame" aria-live="polite"></div> <p id="eco-link-summary" class="eco-beta-summary" aria-live="polite"></p> </section> <div class="eco-diversity-grid"> <section class="eco-card" aria-labelledby="alpha-diversity-heading"> <h3 id="alpha-diversity-heading">Alpha diversity explorer</h3> <p class="eco-caption">Change the source sample richness and dominance, then choose which alpha-diversity metric to emphasize. The intervention-like preview below is driven by the same beta shift strength used in the ordination panel.</p> <div class="eco-control-grid"> <label for="alpha-richness"> Observed taxa <span id="alpha-richness-display" class="eco-control-value"></span> <input id="alpha-richness" type="range" min="3" max="12" step="1" value="6"> </label> <label for="alpha-dominance"> Dominance of the most abundant taxon <span id="alpha-dominance-display" class="eco-control-value"></span> <input id="alpha-dominance" type="range" min="0" max="0.85" step="0.01" value="0.35"> </label> <label class="eco-select-label" for="alpha-focus-metric"> Alpha metric to emphasize <select id="alpha-focus-metric"> <option value="shannon" selected>Shannon diversity</option> <option value="simpson">Simpson diversity</option> <option value="inverse">Inverse Simpson</option> <option value="evenness">Pielou evenness</option> </select> </label> </div> <div id="alpha-composition-chart" class="eco-chart-frame" aria-live="polite"></div> <div class="eco-metric-grid" aria-live="polite"> <div class="eco-metric-card"> <span class="eco-metric-label">Richness</span> <span class="eco-metric-value" id="alpha-richness-value"></span> </div> <div class="eco-metric-card"> <span class="eco-metric-label">Shannon</span> <span class="eco-metric-value" id="alpha-shannon-value"></span> </div> <div class="eco-metric-card"> <span class="eco-metric-label">Simpson</span> <span class="eco-metric-value" id="alpha-simpson-value"></span> </div> <div class="eco-metric-card"> <span class="eco-metric-label">Inverse Simpson</span> <span class="eco-metric-value" id="alpha-inverse-value"></span> </div> <div class="eco-metric-card"> <span class="eco-metric-label">Pielou evenness</span> <span class="eco-metric-value" id="alpha-evenness-value"></span> </div> </div> <p id="alpha-link-summary" class="eco-beta-summary" aria-live="polite"></p> </section> <section class="eco-card" aria-labelledby="beta-diversity-heading"> <h3 id="beta-diversity-heading">Beta diversity explorer</h3> <p class="eco-caption">Use the same alpha-defined source sample, then control how far the intervention group moves, how dispersed each group is, how many samples to simulate, and how Jaccard presence/absence is thresholded.</p> <div class="eco-control-grid"> <label for="beta-shift"> Intervention shift strength <span id="beta-shift-display" class="eco-control-value"></span> <input id="beta-shift" type="range" min="0" max="0.55" step="0.01" value="0.22"> </label> <label for="beta-variability"> Within-group variability <span id="beta-variability-display" class="eco-control-value"></span> <input id="beta-variability" type="range" min="0.01" max="0.18" step="0.01" value="0.06"> </label> <label for="beta-samples"> Samples per group <span id="beta-samples-display" class="eco-control-value"></span> <input id="beta-samples" type="range" min="2" max="5" step="1" value="3"> </label> <label for="beta-presence-threshold"> Jaccard presence threshold <span id="beta-presence-display" class="eco-control-value"></span> <input id="beta-presence-threshold" type="range" min="0.02" max="0.20" step="0.01" value="0.08"> </label> <label class="eco-select-label" for="beta-metric"> Distance metric <select id="beta-metric"> <option value="bray">Bray-Curtis</option> <option value="jaccard">Jaccard</option> <option value="aitchison">Aitchison-like CLR Euclidean</option> </select> </label> <label class="eco-select-label" for="beta-ordination"> Ordination and visualization <select id="beta-ordination"> <option value="pcoa">PCoA</option> <option value="nmds">NMDS</option> <option value="tsne">t-SNE</option> <option value="umap">UMAP</option> </select> </label> </div> <div id="beta-ordination-chart" class="eco-chart-frame"></div> <div class="eco-inline-legend" aria-hidden="true"> <span class="eco-legend-item"><span class="eco-legend-swatch eco-legend-swatch--baseline"></span>Source sample</span> <span class="eco-legend-item"><span class="eco-legend-swatch eco-legend-swatch--control"></span>Control samples</span> <span class="eco-legend-item"><span class="eco-legend-swatch eco-legend-swatch--intervention"></span>Intervention samples</span> </div> <p id="beta-diversity-summary" class="eco-beta-summary" aria-live="polite"></p> </section> </div> `; const baselineChart = document.getElementById("eco-baseline-chart"); const linkSummary = document.getElementById("eco-link-summary"); const alphaRichnessInput = document.getElementById("alpha-richness"); const alphaDominanceInput = document.getElementById("alpha-dominance"); const alphaFocusMetricInput = document.getElementById("alpha-focus-metric"); const alphaRichnessDisplay = document.getElementById("alpha-richness-display"); const alphaDominanceDisplay = document.getElementById("alpha-dominance-display"); const alphaCompositionChart = document.getElementById("alpha-composition-chart"); const alphaRichnessValue = document.getElementById("alpha-richness-value"); const alphaShannonValue = document.getElementById("alpha-shannon-value"); const alphaSimpsonValue = document.getElementById("alpha-simpson-value"); const alphaInverseValue = document.getElementById("alpha-inverse-value"); const alphaEvennessValue = document.getElementById("alpha-evenness-value"); const alphaLinkSummary = document.getElementById("alpha-link-summary"); const betaShiftInput = document.getElementById("beta-shift"); const betaVariabilityInput = document.getElementById("beta-variability"); const betaSamplesInput = document.getElementById("beta-samples"); const betaPresenceInput = document.getElementById("beta-presence-threshold"); const betaShiftDisplay = document.getElementById("beta-shift-display"); const betaVariabilityDisplay = document.getElementById("beta-variability-display"); const betaSamplesDisplay = document.getElementById("beta-samples-display"); const betaPresenceDisplay = document.getElementById("beta-presence-display"); const betaMetricInput = document.getElementById("beta-metric"); const betaOrdinationInput = document.getElementById("beta-ordination"); const betaOrdinationChart = document.getElementById("beta-ordination-chart"); const betaSummary = document.getElementById("beta-diversity-summary"); const palette = [ "#1976d2", "#2e7d32", "#c62828", "#7e57c2", "#ef6c00", "#00897b", "#5d4037", "#546e7a", "#f9a825", "#8e24aa", "#3949ab", "#6d4c41" ]; const AITCHISON_DISPLAY_SCALE = 5; const COORDINATE_JITTER = 0.02; const NMDS_TARGET_OFFSET = 0.25; const NMDS_TARGET_SPAN = 1.15; const metricLabels = { bray: "Bray-Curtis", jaccard: "Jaccard", aitchison: "Aitchison-like CLR Euclidean" }; const ordinationLabels = { pcoa: "PCoA", nmds: "NMDS", tsne: "t-SNE", umap: "UMAP" }; const ordinationDescriptions = { pcoa: "PCoA keeps the selected distance metric as faithful as possible in two dimensions.", nmds: "NMDS emphasizes the rank-order of dissimilarities, so relative separation matters more than exact geometry.", tsne: "t-SNE emphasizes local neighborhoods; compact clusters are visual hints rather than effect-size estimates.", umap: "UMAP balances local neighborhoods with broader structure, which often keeps group trends visible while preserving nearby samples." }; const focusMetricLabels = { shannon: "Shannon diversity", simpson: "Simpson diversity", inverse: "Inverse Simpson", evenness: "Pielou evenness" }; const formatNumber = (value) => Number(value).toFixed(2); const shannon = (proportions) => -proportions.reduce((sum, value) => sum + (value > 0 ? value * Math.log(value) : 0), 0); const simpson = (proportions) => 1 - proportions.reduce((sum, value) => sum + value * value, 0); function inverseSimpson(proportions) { const denominator = proportions.reduce((sum, value) => sum + value * value, 0); return denominator > 0 ? 1 / denominator : 0; } function getLabelColor(hexColor) { if (!hexColor || typeof hexColor !== "string") { return "white"; } const hex = hexColor.replace("#", ""); if (hex.length !== 6) { return "white"; } const r = parseInt(hex.slice(0, 2), 16); const g = parseInt(hex.slice(2, 4), 16); const b = parseInt(hex.slice(4, 6), 16); const brightness = (r * 299 + g * 587 + b * 114) / 1000; return brightness > 150 ? "#111827" : "white"; } function alphaComposition(richness, dominance) { const maxShare = (1 / richness) + dominance * (1 - 1 / richness); const otherShare = (1 - maxShare) / (richness - 1); return Array.from({ length: richness }, (_, index) => index === 0 ? maxShare : otherShare); } function normalize(values) { const total = values.reduce((sum, value) => sum + value, 0); return values.map((value) => value / total); } function currentSourceProfile() { return alphaComposition(Number(alphaRichnessInput.value), Number(alphaDominanceInput.value)); } function stackedBarSegments(values, options = {}) { const { x = 0, y = 0, width = 100, height = 16, labelPrefix = "T" } = options; let offset = x; return values.map((value, index) => { const segmentWidth = value * width; const segmentColor = palette[index % palette.length]; const rect = `<rect x="${offset}" y="${y}" width="${segmentWidth}" height="${height}" rx="4" ry="4" fill="${segmentColor}"></rect>`; const labelColor = getLabelColor(segmentColor); const label = segmentWidth > 13 ? `<text x="${offset + segmentWidth / 2}" y="${y + height / 2 + 3}" text-anchor="middle" font-size="8" fill="${labelColor}" font-weight="700">${labelPrefix}${index + 1}</text>` : ""; offset += segmentWidth; return rect + label; }).join(""); } function metricsFor(proportions) { const richness = proportions.length; const alphaShannon = shannon(proportions); return { richness, shannon: alphaShannon, simpson: simpson(proportions), inverse: inverseSimpson(proportions), evenness: richness > 1 ? alphaShannon / Math.log(richness) : 0 }; } function metricValue(metrics, key) { if (key === "richness") { return metrics.richness; } return metrics[key]; } function buildInterventionSignal(length) { const raw = Array.from({ length }, (_, index) => { const fraction = length === 1 ? 0 : index / (length - 1); if (fraction < 0.34) { return -0.15 + 0.05 * fraction; } if (fraction > 0.66) { return 0.12 + 0.05 * (fraction - 0.66); } return 0.03 * (fraction - 0.5); }); const mean = raw.reduce((sum, value) => sum + value, 0) / raw.length; return raw.map((value) => value - mean); } function patternedOffset(length, sampleIndex, amplitude, phase) { const raw = Array.from({ length }, (_, index) => Math.sin((index + 1 + phase) * (sampleIndex + 1) * 0.9) + 0.45 * Math.cos((index + 1) * 1.7 + phase) ); const mean = raw.reduce((sum, value) => sum + value, 0) / raw.length; return raw.map((value) => (value - mean) * amplitude / length); } function shiftedProfile(base, signal, strength, offset) { return normalize(base.map((value, index) => Math.max(0.005, value + signal[index] * strength + offset[index]) )); } function interventionPreviewProfile(baseProfile) { const shiftStrength = Number(betaShiftInput.value); const variability = Number(betaVariabilityInput.value); return shiftedProfile( baseProfile, buildInterventionSignal(baseProfile.length), shiftStrength, patternedOffset(baseProfile.length, 0, variability * 0.7, 1.1) ); } function buildBetaSamples(baseProfile, shiftStrength, variability, sampleCount) { const interventionSignal = buildInterventionSignal(baseProfile.length); const controlSignal = interventionSignal.map((value) => value * 0.14); const samples = [{ name: "Source", group: "Baseline", values: baseProfile }]; for (let index = 0; index < sampleCount; index += 1) { samples.push({ name: `Control ${index + 1}`, group: "Control", values: shiftedProfile( baseProfile, controlSignal, variability * 0.45, patternedOffset(baseProfile.length, index, 0.02 + variability * 0.42, 0.4) ) }); } for (let index = 0; index < sampleCount; index += 1) { samples.push({ name: `Intervention ${index + 1}`, group: "Intervention", values: shiftedProfile( baseProfile, interventionSignal, shiftStrength, patternedOffset(baseProfile.length, index, 0.025 + variability * 0.5, 1.6) ) }); } return samples; } function brayCurtis(a, b) { let numerator = 0; let denominator = 0; for (let i = 0; i < a.length; i += 1) { numerator += Math.abs(a[i] - b[i]); denominator += a[i] + b[i]; } return denominator === 0 ? 0 : numerator / denominator; } function jaccard(a, b, presenceThreshold) { let intersection = 0; let union = 0; for (let i = 0; i < a.length; i += 1) { const presentA = a[i] > presenceThreshold ? 1 : 0; const presentB = b[i] > presenceThreshold ? 1 : 0; if (presentA || presentB) { union += 1; } if (presentA && presentB) { intersection += 1; } } return union === 0 ? 0 : 1 - (intersection / union); } function clrEuclidean(a, b) { const pseudo = 1e-6; const clr = (values) => { const logs = values.map((value) => Math.log(value + pseudo)); const meanLog = logs.reduce((sum, value) => sum + value, 0) / logs.length; return logs.map((value) => value - meanLog); }; const clrA = clr(a); const clrB = clr(b); const squared = clrA.reduce((sum, value, index) => sum + Math.pow(value - clrB[index], 2), 0); return Math.sqrt(squared) / AITCHISON_DISPLAY_SCALE; } function betaDistance(a, b, metric, presenceThreshold) { if (metric === "jaccard") { return jaccard(a, b, presenceThreshold); } if (metric === "aitchison") { return clrEuclidean(a, b); } return brayCurtis(a, b); } function identityMatrix(size) { return Array.from({ length: size }, (_, rowIndex) => Array.from({ length: size }, (_, colIndex) => rowIndex === colIndex ? 1 : 0) ); } function jacobiEigenDecomposition(matrix, maxIterations = 100) { const size = matrix.length; const a = matrix.map((row) => row.slice()); const vectors = identityMatrix(size); for (let iteration = 0; iteration < maxIterations; iteration += 1) { let p = 0; let q = 1; let maxValue = Math.abs(a[p][q]); for (let rowIndex = 0; rowIndex < size; rowIndex += 1) { for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) { const candidate = Math.abs(a[rowIndex][colIndex]); if (candidate > maxValue) { maxValue = candidate; p = rowIndex; q = colIndex; } } } if (!Number.isFinite(maxValue) || maxValue < 1e-9) { break; } const app = a[p][p]; const aqq = a[q][q]; const apq = a[p][q]; const angle = 0.5 * Math.atan2(2 * apq, aqq - app); const cosine = Math.cos(angle); const sine = Math.sin(angle); for (let index = 0; index < size; index += 1) { if (index !== p && index !== q) { const aip = a[index][p]; const aiq = a[index][q]; a[index][p] = a[p][index] = cosine * aip - sine * aiq; a[index][q] = a[q][index] = sine * aip + cosine * aiq; } } a[p][p] = (cosine * cosine * app) - (2 * sine * cosine * apq) + (sine * sine * aqq); a[q][q] = (sine * sine * app) + (2 * sine * cosine * apq) + (cosine * cosine * aqq); a[p][q] = 0; a[q][p] = 0; for (let index = 0; index < size; index += 1) { const vip = vectors[index][p]; const viq = vectors[index][q]; vectors[index][p] = cosine * vip - sine * viq; vectors[index][q] = sine * vip + cosine * viq; } } return Array.from({ length: size }, (_, index) => ({ value: a[index][index], vector: vectors.map((row) => row[index]) })).sort((left, right) => right.value - left.value); } function centerCoords(coords) { const meanX = coords.reduce((sum, point) => sum + point[0], 0) / coords.length; const meanY = coords.reduce((sum, point) => sum + point[1], 0) / coords.length; return coords.map((point) => [point[0] - meanX, point[1] - meanY]); } function normalizedCoords(coords) { if (!coords.length) { return []; } const centered = centerCoords(coords); const maxAbs = Math.max(...centered.flat().map((value) => Math.abs(value))); if (!Number.isFinite(maxAbs) || maxAbs < 1e-6) { return coords.map((_, index) => { const angle = (2 * Math.PI * index) / coords.length; return [Math.cos(angle), Math.sin(angle)]; }); } return centered.map((point, index) => [ point[0] / maxAbs + COORDINATE_JITTER * Math.cos(index), point[1] / maxAbs + COORDINATE_JITTER * Math.sin(index) ]); } function distanceMatrix(samples, metric, presenceThreshold) { return samples.map((rowSample, rowIndex) => samples.map((colSample, colIndex) => rowIndex === colIndex ? 0 : betaDistance(rowSample.values, colSample.values, metric, presenceThreshold) ) ); } function classicalMDS(distances) { const size = distances.length; const squared = distances.map((row) => row.map((value) => value * value)); const rowMeans = squared.map((row) => row.reduce((sum, value) => sum + value, 0) / size); const columnMeans = Array.from({ length: size }, (_, colIndex) => squared.reduce((sum, row) => sum + row[colIndex], 0) / size ); const grandMean = rowMeans.reduce((sum, value) => sum + value, 0) / size; const centered = squared.map((row, rowIndex) => row.map((value, colIndex) => -0.5 * (value - rowMeans[rowIndex] - columnMeans[colIndex] + grandMean)) ); const eigenpairs = jacobiEigenDecomposition(centered) .filter((pair) => pair.value > 1e-8) .slice(0, 2); if (!eigenpairs.length) { return normalizedCoords(Array.from({ length: size }, () => [0, 0])); } const coords = Array.from({ length: size }, (_, rowIndex) => eigenpairs.map((pair) => pair.vector[rowIndex] * Math.sqrt(pair.value)) ).map((point) => point.length === 2 ? point : [point[0], 0]); return normalizedCoords(coords); } function rankDisparities(distances) { const pairs = []; for (let rowIndex = 0; rowIndex < distances.length; rowIndex += 1) { for (let colIndex = rowIndex + 1; colIndex < distances.length; colIndex += 1) { pairs.push({ rowIndex, colIndex, distance: distances[rowIndex][colIndex] }); } } pairs.sort((left, right) => left.distance - right.distance); const denominator = Math.max(1, pairs.length - 1); pairs.forEach((pair, index) => { pair.target = NMDS_TARGET_OFFSET + (index / denominator) * NMDS_TARGET_SPAN; }); return pairs; } function nmdsProjection(distances) { let coords = classicalMDS(distances); const pairs = rankDisparities(distances); for (let iteration = 0; iteration < 240; iteration += 1) { const gradients = coords.map(() => [0, 0]); pairs.forEach((pair) => { const pointA = coords[pair.rowIndex]; const pointB = coords[pair.colIndex]; const dx = pointA[0] - pointB[0]; const dy = pointA[1] - pointB[1]; const currentDistance = Math.sqrt((dx * dx) + (dy * dy)) + 1e-6; const pull = 0.10 * (pair.target - currentDistance) / currentDistance; gradients[pair.rowIndex][0] += pull * dx; gradients[pair.rowIndex][1] += pull * dy; gradients[pair.colIndex][0] -= pull * dx; gradients[pair.colIndex][1] -= pull * dy; }); coords = coords.map((point, index) => [ point[0] + gradients[index][0], point[1] + gradients[index][1] ]); coords = normalizedCoords(coords); } return normalizedCoords(coords); } function tsneProjection(distances) { const size = distances.length; const nonZeroDistances = distances.flat().filter((value) => value > 0); const sigma = nonZeroDistances.length ? nonZeroDistances.reduce((sum, value) => sum + value, 0) / nonZeroDistances.length : 1; const affinities = Array.from({ length: size }, () => Array(size).fill(0)); let affinityTotal = 0; for (let rowIndex = 0; rowIndex < size; rowIndex += 1) { for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) { const value = Math.exp(-(distances[rowIndex][colIndex] ** 2) / (2 * sigma * sigma + 1e-6)); affinities[rowIndex][colIndex] = value; affinities[colIndex][rowIndex] = value; affinityTotal += value * 2; } } for (let rowIndex = 0; rowIndex < size; rowIndex += 1) { for (let colIndex = 0; colIndex < size; colIndex += 1) { affinities[rowIndex][colIndex] = affinityTotal > 0 ? affinities[rowIndex][colIndex] / affinityTotal : 0; } } let coords = classicalMDS(distances).map((point, index) => [ point[0] + 0.04 * Math.cos(index * 1.7), point[1] + 0.04 * Math.sin(index * 1.7) ]); for (let iteration = 0; iteration < 220; iteration += 1) { const numerators = Array.from({ length: size }, () => Array(size).fill(0)); let qTotal = 0; for (let rowIndex = 0; rowIndex < size; rowIndex += 1) { for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) { const dx = coords[rowIndex][0] - coords[colIndex][0]; const dy = coords[rowIndex][1] - coords[colIndex][1]; const numerator = 1 / (1 + (dx * dx) + (dy * dy)); numerators[rowIndex][colIndex] = numerator; numerators[colIndex][rowIndex] = numerator; qTotal += numerator * 2; } } const gradients = coords.map(() => [0, 0]); for (let rowIndex = 0; rowIndex < size; rowIndex += 1) { for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) { const dx = coords[rowIndex][0] - coords[colIndex][0]; const dy = coords[rowIndex][1] - coords[colIndex][1]; const qValue = qTotal > 0 ? numerators[rowIndex][colIndex] / qTotal : 0; const push = 4 * (affinities[rowIndex][colIndex] - qValue) * numerators[rowIndex][colIndex]; gradients[rowIndex][0] += push * dx; gradients[rowIndex][1] += push * dy; gradients[colIndex][0] -= push * dx; gradients[colIndex][1] -= push * dy; } } const learningRate = iteration < 80 ? 0.7 : 0.28; coords = coords.map((point, index) => [ point[0] + learningRate * gradients[index][0], point[1] + learningRate * gradients[index][1] ]); coords = normalizedCoords(coords); } return normalizedCoords(coords); } function umapProjection(distances) { const size = distances.length; const nonZeroDistances = distances.flat().filter((value) => value > 0); const scale = nonZeroDistances.length ? nonZeroDistances.reduce((sum, value) => sum + value, 0) / nonZeroDistances.length : 1; const weights = Array.from({ length: size }, () => Array(size).fill(0)); for (let rowIndex = 0; rowIndex < size; rowIndex += 1) { const neighbors = distances[rowIndex] .map((distance, colIndex) => ({ distance, colIndex })) .filter((entry) => entry.colIndex !== rowIndex) .sort((left, right) => left.distance - right.distance) .slice(0, 3); neighbors.forEach((neighbor) => { const weight = Math.exp(-neighbor.distance / (scale + 1e-6)); weights[rowIndex][neighbor.colIndex] = Math.max(weights[rowIndex][neighbor.colIndex], weight); weights[neighbor.colIndex][rowIndex] = Math.max(weights[neighbor.colIndex][rowIndex], weight); }); } let coords = classicalMDS(distances); for (let iteration = 0; iteration < 220; iteration += 1) { const forces = coords.map(() => [0, 0]); for (let rowIndex = 0; rowIndex < size; rowIndex += 1) { for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) { const dx = coords[rowIndex][0] - coords[colIndex][0]; const dy = coords[rowIndex][1] - coords[colIndex][1]; const currentDistance = Math.sqrt((dx * dx) + (dy * dy)) + 1e-6; const weight = weights[rowIndex][colIndex]; if (weight > 0) { const desired = 0.42 + (0.28 * (1 - weight)); const attraction = 0.16 * weight * (desired - currentDistance) / currentDistance; forces[rowIndex][0] += attraction * dx; forces[rowIndex][1] += attraction * dy; forces[colIndex][0] -= attraction * dx; forces[colIndex][1] -= attraction * dy; } const repulsion = 0.018 / (0.18 + (currentDistance * currentDistance)); forces[rowIndex][0] += repulsion * dx; forces[rowIndex][1] += repulsion * dy; forces[colIndex][0] -= repulsion * dx; forces[colIndex][1] -= repulsion * dy; } } coords = coords.map((point, index) => [ point[0] + forces[index][0], point[1] + forces[index][1] ]); coords = normalizedCoords(coords); } return normalizedCoords(coords); } function ordinationCoords(distances, ordination) { if (ordination === "nmds") { return nmdsProjection(distances); } if (ordination === "tsne") { return tsneProjection(distances); } if (ordination === "umap") { return umapProjection(distances); } return classicalMDS(distances); } function projectToCanvas(coords, width, height, padding) { const xs = coords.map((point) => point[0]); const ys = coords.map((point) => point[1]); const xMin = Math.min(...xs); const xMax = Math.max(...xs); const yMin = Math.min(...ys); const yMax = Math.max(...ys); const xSpan = Math.max(1e-6, xMax - xMin); const ySpan = Math.max(1e-6, yMax - yMin); return coords.map((point) => ({ x: padding + ((point[0] - xMin) / xSpan) * (width - 2 * padding), y: height - padding - ((point[1] - yMin) / ySpan) * (height - 2 * padding) })); } function sampleMarker(sample, point, pointColor) { if (sample.group === "Baseline") { return `<path d="M ${point.x} ${point.y - 8} L ${point.x + 8} ${point.y} L ${point.x} ${point.y + 8} L ${point.x - 8} ${point.y} Z" fill="white" stroke="${pointColor}" stroke-width="2"></path>`; } if (sample.group === "Intervention") { return `<rect x="${point.x - 6}" y="${point.y - 6}" width="12" height="12" rx="2" ry="2" fill="white" stroke="${pointColor}" stroke-width="2"></rect>`; } return `<circle cx="${point.x}" cy="${point.y}" r="6" fill="white" stroke="${pointColor}" stroke-width="2"></circle>`; } function renderSharedBaseline() { const baseProfile = currentSourceProfile(); const sampleCount = Number(betaSamplesInput.value); baselineChart.innerHTML = ` <svg viewBox="0 0 320 124" width="100%" role="img" aria-label="Shared source community that connects the alpha and beta diversity examples"> <text x="30" y="16" font-size="10" fill="#334155" font-weight="700">Current source sample (T1-T${baseProfile.length})</text> ${stackedBarSegments(baseProfile, { x: 30, y: 24, width: 260, height: 20 })} <text x="30" y="58" font-size="8.5" fill="#475569">This single composition now feeds both panels below.</text> <line x1="160" y1="48" x2="160" y2="70" stroke="#94a3b8" stroke-width="1.4"></line> <line x1="160" y1="70" x2="82" y2="70" stroke="#94a3b8" stroke-width="1.4"></line> <line x1="160" y1="70" x2="238" y2="70" stroke="#94a3b8" stroke-width="1.4"></line> <rect x="18" y="72" width="128" height="34" rx="8" ry="8" fill="#eff6ff" stroke="#bfdbfe"></rect> <rect x="174" y="72" width="128" height="34" rx="8" ry="8" fill="#fef2f2" stroke="#fecaca"></rect> <text x="82" y="88" text-anchor="middle" font-size="8.5" fill="#1e3a8a" font-weight="700">Alpha diversity</text> <text x="82" y="99" text-anchor="middle" font-size="7.5" fill="#1f2937">One sample → richness + evenness</text> <text x="238" y="88" text-anchor="middle" font-size="8.5" fill="#991b1b" font-weight="700">Beta diversity</text> <text x="238" y="99" text-anchor="middle" font-size="7.5" fill="#1f2937">${sampleCount} control + ${sampleCount} intervention samples</text> </svg> `; linkSummary.textContent = `Right now the alpha controls define a ${baseProfile.length}-taxon source sample that becomes the anchor for every point in the beta-diversity ordination below.`; } function renderAlpha() { const baseProfile = currentSourceProfile(); const previewProfile = interventionPreviewProfile(baseProfile); const baseMetrics = metricsFor(baseProfile); const previewMetrics = metricsFor(previewProfile); const focusMetric = alphaFocusMetricInput.value; alphaRichnessDisplay.textContent = `${alphaRichnessInput.value} taxa`; alphaDominanceDisplay.textContent = `${formatNumber(alphaDominanceInput.value)} of the sample in T1`; alphaRichnessValue.textContent = String(baseMetrics.richness); alphaShannonValue.textContent = formatNumber(baseMetrics.shannon); alphaSimpsonValue.textContent = formatNumber(baseMetrics.simpson); alphaInverseValue.textContent = formatNumber(baseMetrics.inverse); alphaEvennessValue.textContent = formatNumber(baseMetrics.evenness); alphaCompositionChart.innerHTML = ` <svg viewBox="0 0 100 110" width="100%" role="img" aria-label="Current source sample and intervention-like preview for the alpha-diversity example"> <text x="4" y="12" font-size="7.5" fill="#334155">Current shared source sample</text> ${stackedBarSegments(baseProfile, { x: 4, y: 18, width: 92, height: 18 })} <line x1="50" y1="37" x2="50" y2="48" stroke="#94a3b8" stroke-width="0.7" stroke-dasharray="2 2"></line> <text x="4" y="57" font-size="7.5" fill="#334155">Intervention-like preview from beta shift</text> ${stackedBarSegments(previewProfile, { x: 4, y: 63, width: 92, height: 18 })} <text x="4" y="94" font-size="7" fill="#475569">The beta shift slider now changes this preview too, so the alpha and beta examples stay connected.</text> </svg> `; alphaLinkSummary.textContent = `${focusMetricLabels[focusMetric]} changes from ${formatNumber(metricValue(baseMetrics, focusMetric))} in the shared source sample to ${formatNumber(metricValue(previewMetrics, focusMetric))} in the intervention-like preview generated by the current beta settings.`; } function renderBeta() { const baseProfile = currentSourceProfile(); const shiftStrength = Number(betaShiftInput.value); const variability = Number(betaVariabilityInput.value); const sampleCount = Number(betaSamplesInput.value); const presenceThreshold = Number(betaPresenceInput.value); const metric = betaMetricInput.value; const ordination = betaOrdinationInput.value; const samples = buildBetaSamples(baseProfile, shiftStrength, variability, sampleCount); const distances = distanceMatrix(samples, metric, presenceThreshold); const projectedPoints = projectToCanvas(ordinationCoords(distances, ordination), 340, 235, 36); const baselinePoint = projectedPoints[0]; const controlPoints = projectedPoints.filter((_, index) => samples[index].group === "Control"); const interventionPoints = projectedPoints.filter((_, index) => samples[index].group === "Intervention"); const controlCentroid = { x: controlPoints.reduce((sum, point) => sum + point.x, 0) / controlPoints.length, y: controlPoints.reduce((sum, point) => sum + point.y, 0) / controlPoints.length }; const interventionCentroid = { x: interventionPoints.reduce((sum, point) => sum + point.x, 0) / interventionPoints.length, y: interventionPoints.reduce((sum, point) => sum + point.y, 0) / interventionPoints.length }; betaShiftDisplay.textContent = formatNumber(shiftStrength); betaVariabilityDisplay.textContent = formatNumber(variability); betaSamplesDisplay.textContent = `${sampleCount} samples`; betaPresenceDisplay.textContent = formatNumber(presenceThreshold); const trajectoryLines = projectedPoints.slice(1).map((point, index) => { const sample = samples[index + 1]; const stroke = sample.group === "Control" ? "#93c5fd" : "#fca5a5"; return `<line x1="${baselinePoint.x}" y1="${baselinePoint.y}" x2="${point.x}" y2="${point.y}" stroke="${stroke}" stroke-width="1.2" stroke-dasharray="3 3"></line>`; }).join(""); const axisX = `<line x1="36" y1="199" x2="304" y2="199" stroke="#cbd5e1" stroke-width="1"></line>`; const axisY = `<line x1="36" y1="199" x2="36" y2="30" stroke="#cbd5e1" stroke-width="1"></line>`; const centroidAnnotations = ` <line x1="${baselinePoint.x}" y1="${baselinePoint.y}" x2="${controlCentroid.x}" y2="${controlCentroid.y}" stroke="#1976d2" stroke-width="1.8"></line> <line x1="${baselinePoint.x}" y1="${baselinePoint.y}" x2="${interventionCentroid.x}" y2="${interventionCentroid.y}" stroke="#c62828" stroke-width="1.8"></line> <text x="${controlCentroid.x + 8}" y="${controlCentroid.y - 8}" font-size="8" fill="#0f172a">Control centroid</text> <text x="${interventionCentroid.x + 8}" y="${interventionCentroid.y - 8}" font-size="8" fill="#0f172a">Intervention centroid</text> `; const points = samples.map((sample, index) => { const point = projectedPoints[index]; const pointColor = sample.group === "Baseline" ? "#64748b" : sample.group === "Control" ? "#1976d2" : "#c62828"; return ` ${sampleMarker(sample, point, pointColor)} <text x="${point.x + 9}" y="${point.y - 9}" font-size="7.6" fill="#334155">${sample.name}</text> `; }).join(""); betaOrdinationChart.innerHTML = ` <svg viewBox="0 0 340 235" width="100%" role="img" aria-label="High-dimensional beta diversity ordination with source, control, and intervention samples"> <text x="12" y="16" font-size="10" fill="#334155" font-weight="700">${ordinationLabels[ordination]} view of the linked alpha-defined source sample</text> <text x="12" y="30" font-size="8" fill="#475569">Distance metric: ${metricLabels[metric]}. Dashed trajectories connect the shared source sample to each simulated cohort sample.</text> ${axisX} ${axisY} <text x="296" y="214" font-size="8" fill="#475569">${ordinationLabels[ordination]} 1</text> <text x="10" y="42" font-size="8" fill="#475569" transform="rotate(-90 10 42)">${ordinationLabels[ordination]} 2</text> ${trajectoryLines} ${centroidAnnotations} ${points} </svg> `; const nonBaselineSamples = samples.filter((sample) => sample.group !== "Baseline"); const nonBaselineDistances = distanceMatrix(nonBaselineSamples, metric, presenceThreshold); const within = []; const between = []; nonBaselineSamples.forEach((rowSample, rowIndex) => { nonBaselineSamples.forEach((colSample, colIndex) => { if (rowIndex < colIndex) { const value = nonBaselineDistances[rowIndex][colIndex]; if (rowSample.group === colSample.group) { within.push(value); } else { between.push(value); } } }); }); const meanWithin = within.reduce((sum, value) => sum + value, 0) / within.length; const meanBetween = between.reduce((sum, value) => sum + value, 0) / between.length; const thresholdNote = metric === "jaccard" ? ` Presence threshold = ${formatNumber(presenceThreshold)}.` : ""; betaSummary.textContent = `${metricLabels[metric]} + ${ordinationLabels[ordination]} using the current alpha-defined source sample: mean within-group distance = ${formatNumber(meanWithin)} and mean between-group distance = ${formatNumber(meanBetween)}.${thresholdNote} ${ordinationDescriptions[ordination]}`; } let scheduledRenderFrame = null; function renderAll() { renderSharedBaseline(); renderAlpha(); renderBeta(); } function scheduleRenderAll() { if (scheduledRenderFrame !== null) { window.cancelAnimationFrame(scheduledRenderFrame); } scheduledRenderFrame = window.requestAnimationFrame(() => { scheduledRenderFrame = null; renderAll(); }); } function determineInputEventType(input) { return input instanceof HTMLSelectElement ? "change" : "input"; } [ alphaRichnessInput, alphaDominanceInput, alphaFocusMetricInput, betaShiftInput, betaVariabilityInput, betaSamplesInput, betaPresenceInput ].forEach((input) => { input.addEventListener(determineInputEventType(input), scheduleRenderAll); }); betaMetricInput.addEventListener("change", scheduleRenderAll); betaOrdinationInput.addEventListener("change", scheduleRenderAll); renderAll(); }); </script> ::: ## What files to start from A practical minimal input set is: | File | Typical source | What it contains | |---|---|---| | `metaphlan_merged.tsv` | `merge_metaphlan_tables.py` | Taxa x samples relative abundance matrix | | `pathabundance.tsv` or joined HUMAnN table | `humann_join_tables` | Pathways or gene families x samples | | `metadata.tsv` | Your study design | Sample IDs, group labels, time points, etc. | For diversity calculations you want a matrix with: - **rows = samples** - **columns = taxa or pathways** - **values = non-negative abundances** --- ## R packages ```r install.packages(c("vegan", "dplyr", "tidyr", "tibble", "ggplot2", "Rtsne", "uwot")) ``` ```r library(vegan) library(dplyr) library(tidyr) library(tibble) library(ggplot2) library(Rtsne) library(uwot) ``` --- ## Step 1: Read a MetaPhlAn table A merged MetaPhlAn table usually has one taxon per row and one sample per column. The example below keeps only **species-level rows**. ```r metaphlan_raw <- read.delim("metaphlan_merged.tsv", check.names = FALSE, comment.char = "#") metadata <- read.delim("metadata.tsv", check.names = FALSE) species_mat <- metaphlan_raw |> rename(feature = clade_name) |> filter(grepl("\\|s__[^|]+$", feature)) |> column_to_rownames("feature") |> t() |> as.data.frame() |> rownames_to_column("sample_id") |> left_join(metadata, by = "sample_id") sample_info <- species_mat |> select(sample_id, group, timepoint) abund <- species_mat |> select(-sample_id, -group, -timepoint) |> as.matrix() mode(abund) <- "numeric" rownames(abund) <- sample_info$sample_id # MetaPhlAn is already relative abundance, but re-scaling is a safe habit. abund_rel <- decostand(abund, method = "total") ``` ::: {.callout-tip} If you prefer **family-level** diversity, change the filter to `grepl("\\|f__[^|]+$", feature)` instead of species-level rows. ::: ### HUMAnN variant For HUMAnN, use the same structure but usually remove **stratified rows** first so that each feature is a pathway or gene family rather than a pathway-with-species combination. ```r humann_raw <- read.delim("pathabundance.tsv", check.names = FALSE) pathway_mat <- humann_raw |> rename(feature = `# Pathway`) |> filter(!grepl("\\|", feature)) |> column_to_rownames("feature") |> t() |> as.matrix() mode(pathway_mat) <- "numeric" pathway_rel <- decostand(pathway_mat, method = "total") ``` --- ## Alpha diversity **Alpha diversity** asks: *how diverse is each sample on its own?* ### The two basics: richness and evenness - **Richness** = how many taxa/features are detected - **Evenness** = how evenly abundance is spread across those taxa/features In practice, beginners usually report richness and evenness alongside three very common alpha-diversity metrics: 1. **Shannon diversity** 2. **Simpson diversity** 3. **Inverse Simpson diversity** ### When to use each metric | Metric | What it emphasizes | Intuition | |---|---|---| | **Richness** | Number of detected taxa | "How many things are here?" | | **Pielou evenness** | Balance of abundances | "Are a few taxa dominating?" | | **Shannon** | Both richness and evenness | Sensitive to common taxa, but still notices rare taxa | | **Simpson** | Dominance | Drops when one or a few taxa take over | | **Inverse Simpson** | Effective number of dominant taxa | Easier to explain to beginners than raw Simpson | ### Calculate alpha diversity with `vegan` ```r alpha_div <- tibble( sample_id = rownames(abund_rel), richness = specnumber(abund_rel), shannon = diversity(abund_rel, index = "shannon"), simpson = diversity(abund_rel, index = "simpson"), inverse_simpson = diversity(abund_rel, index = "invsimpson") ) |> mutate( pielou_evenness = if_else(richness > 1, shannon / log(richness), NA_real_) ) |> left_join(sample_info, by = "sample_id") alpha_div ``` ### Quick interpretation guide - **High richness + high evenness** → many taxa and none dominates too strongly - **High richness + low evenness** → many taxa are present, but only a few are doing most of the work - **Low Shannon** → either few taxa, or strong dominance by one/few taxa - **Low Simpson / inverse Simpson** → especially strong dominance structure ### Plot alpha diversity ```r ggplot(alpha_div, aes(x = group, y = shannon, fill = group)) + geom_boxplot(width = 0.7, alpha = 0.85, outlier.shape = NA) + geom_jitter(width = 0.12, size = 2, alpha = 0.8) + labs( title = "Shannon alpha diversity by group", x = NULL, y = "Shannon diversity" ) + theme_minimal(base_size = 12) + theme(legend.position = "none") ``` ### Statistical testing for alpha diversity For two groups, a Wilcoxon rank-sum test is a common first pass. For more complex designs, use a linear model or mixed model. ```r wilcox.test(shannon ~ group, data = alpha_div) wilcox.test(inverse_simpson ~ group, data = alpha_div) ``` ::: {.callout-warning} Avoid over-interpreting tiny differences in alpha diversity. A statistically significant result can still be biologically small, especially when sequencing depth, filtering, or zero handling differs between groups. ::: --- ## Beta diversity **Beta diversity** asks: *how different are samples from each other?* The answer depends on the **dissimilarity metric** you choose. ### Three very common beta-diversity metrics | Metric | `vegan` implementation | Best for | Notes | |---|---|---|---| | **Bray-Curtis** | `vegdist(..., method = "bray")` | Relative abundance data | Usually the first choice for MetaPhlAn/HUMAnN profiles | | **Jaccard** | `vegdist(..., method = "jaccard")` after presence/absence conversion | Presence/absence questions | Ignores abundance magnitude | | **Aitchison** | CLR transform + `dist()` | Compositional data | Very useful when you want a composition-aware analysis | ### Bray-Curtis dissimilarity Bray-Curtis is popular because it reflects **abundance shifts** between samples. ```r bray_dist <- vegdist(abund_rel, method = "bray") ``` ### Jaccard dissimilarity Jaccard is useful when you only care whether a taxon/pathway is present. ```r abund_pa <- decostand(abund_rel, method = "pa") jaccard_dist <- vegdist(abund_pa, method = "jaccard") ``` ### Aitchison distance Aitchison distance is built for **compositional data**. It requires a centered log-ratio (CLR) transform, which means adding a small pseudocount first. ```r abund_clr <- decostand(abund_rel + 1e-06, method = "clr") aitchison_dist <- dist(abund_clr) ``` ::: {.callout-tip} A simple beginner workflow is: **Bray-Curtis for your main ecological analysis**, then **Aitchison as a sensitivity check** if you want a composition-aware view. ::: --- ## Ordination and visualization A distance matrix is hard to read directly, so you usually project it into 2 dimensions. ### A useful plotting helper ```r plot_ordination <- function(df, x, y, title) { ggplot(df, aes(x = .data[[x]], y = .data[[y]], color = group)) + geom_point(size = 3, alpha = 0.9) + labs(title = title, x = x, y = y, color = "Group") + theme_minimal(base_size = 12) } ``` ### PCA **PCA** works on a transformed abundance matrix, not directly on a distance matrix. For ecological data, a **Hellinger transform** is a good beginner-friendly choice. ```r abund_hellinger <- decostand(abund_rel, method = "hellinger") pca_fit <- prcomp(abund_hellinger, center = TRUE, scale. = FALSE) pca_df <- as.data.frame(pca_fit$x[, 1:2]) |> rownames_to_column("sample_id") |> left_join(sample_info, by = "sample_id") plot_ordination(pca_df, "PC1", "PC2", "PCA of Hellinger-transformed abundances") ``` ### PCoA **PCoA** is a natural companion to Bray-Curtis because it starts from a dissimilarity matrix. ```r pcoa_fit <- cmdscale(bray_dist, eig = TRUE, k = 2) pcoa_df <- as.data.frame(pcoa_fit$points) |> setNames(c("PCoA1", "PCoA2")) |> rownames_to_column("sample_id") |> left_join(sample_info, by = "sample_id") plot_ordination(pcoa_df, "PCoA1", "PCoA2", "PCoA of Bray-Curtis distances") ``` ### NMDS **NMDS** is one of the most widely used ordination methods in ecology because it focuses on preserving rank-order distances. ```r set.seed(123) nmds_fit <- metaMDS(abund_rel, distance = "bray", k = 2, trymax = 100) nmds_df <- as.data.frame(scores(nmds_fit, display = "sites")) |> rownames_to_column("sample_id") |> left_join(sample_info, by = "sample_id") plot_ordination(nmds_df, "NMDS1", "NMDS2", "NMDS on Bray-Curtis distances") ``` ### t-SNE **t-SNE** is mainly for visualization. It can reveal local neighborhood structure, but distances between distant clusters are not always easy to interpret biologically. ```r set.seed(123) tsne_fit <- Rtsne(abund_hellinger, dims = 2, perplexity = 10, check_duplicates = FALSE) tsne_df <- as.data.frame(tsne_fit$Y) |> setNames(c("tSNE1", "tSNE2")) |> mutate(sample_id = rownames(abund_hellinger)) |> left_join(sample_info, by = "sample_id") plot_ordination(tsne_df, "tSNE1", "tSNE2", "t-SNE of Hellinger-transformed abundances") ``` ### UMAP **UMAP** is another visualization-first method that often gives cleaner-looking clusters than t-SNE while preserving more global structure. ```r set.seed(123) umap_fit <- uwot::umap(abund_hellinger, n_neighbors = 15, min_dist = 0.2) umap_df <- as.data.frame(umap_fit) |> setNames(c("UMAP1", "UMAP2")) |> mutate(sample_id = rownames(abund_hellinger)) |> left_join(sample_info, by = "sample_id") plot_ordination(umap_df, "UMAP1", "UMAP2", "UMAP of Hellinger-transformed abundances") ``` ### Which ordination should you show? A sensible beginner combination is: - **PCoA + Bray-Curtis** for the main ecological result - **NMDS + Bray-Curtis** as a robust ecological cross-check - **PCA** if you want a linear view of transformed abundances - **t-SNE / UMAP** as exploratory visualizations, not your only evidence --- ## Statistical testing for beta diversity Visualization is helpful, but it is not the statistical test. ### PERMANOVA Use `adonis2()` to test whether groups differ in their community composition. ```r adonis2(bray_dist ~ group, data = sample_info, permutations = 999) ``` ### Check dispersion before over-interpreting PERMANOVA PERMANOVA can be significant either because group centroids differ or because one group is much more dispersed. ```r bray_dispersion <- betadisper(bray_dist, sample_info$group) anova(bray_dispersion) permutest(bray_dispersion) ``` ::: {.callout-important} A common reporting pattern is: **Bray-Curtis PCoA plot + PERMANOVA p-value + dispersion check**. ::: --- ## Recommended beginner workflow 1. Start with a **species-level MetaPhlAn** or **unstratified HUMAnN** table. 2. Re-scale to proportions with `decostand(..., "total")`. 3. Calculate **richness, Shannon, Simpson, inverse Simpson, and Pielou evenness**. 4. Use **Bray-Curtis** as your main beta-diversity metric. 5. Visualize beta diversity with **PCoA** and **NMDS**. 6. Use **PERMANOVA** to test group separation. 7. Re-run beta diversity with **Aitchison distance** if you want a composition-aware sensitivity analysis. --- ## Further reading - [MetaPhlAn tutorial](../tools/metaphlan.qmd) - [HUMAnN tutorial](../tools/humann.qmd) - [QIIME 2 tutorial](../tools/qiime2.qmd) for amplicon-oriented diversity workflows - `?vegan::diversity`, `?vegan::vegdist`, `?vegan::metaMDS`, `?vegan::adonis2`

Interactive ecological diversity explainer

What files to start from

R packages

Step 1: Read a MetaPhlAn table

HUMAnN variant

Alpha diversity

The two basics: richness and evenness

When to use each metric

Calculate alpha diversity with vegan

Quick interpretation guide

Plot alpha diversity

Statistical testing for alpha diversity

Beta diversity

Three very common beta-diversity metrics

Bray-Curtis dissimilarity

Jaccard dissimilarity

Aitchison distance

Ordination and visualization

A useful plotting helper

PCA

PCoA

NMDS

t-SNE

UMAP

Which ordination should you show?

Statistical testing for beta diversity

PERMANOVA

Check dispersion before over-interpreting PERMANOVA

Recommended beginner workflow

Further reading

Calculate alpha diversity with `vegan`