---
title: "Ecological Diversity"
subtitle: "Interactive alpha and beta diversity intuition from bioBakery outputs"
---
This page shows a beginner-friendly way to calculate **alpha diversity** (within-sample diversity) and **beta diversity** (between-sample dissimilarity) from the tables produced by the bioBakery ecosystem.
The same workflow works for both:
- **MetaPhlAn** taxonomic profiles if you want organism-level diversity
- **HUMAnN** gene family or pathway tables if you want functional diversity
::: {.callout-note}
For taxonomic diversity, most people start with a **species-level MetaPhlAn table**. For functional diversity, use an **unstratified HUMAnN pathway or gene-family table**.
:::
## Interactive ecological diversity explainer
The mini-lab below is **interactive and intentionally synthetic**: it is there to help you build intuition before you run the R code further down the page on your own tables.
::: {=html}
<style>
#ecological-diversity-lab {
width: min(100%, 1100px);
margin: 1.5rem auto 2rem;
}
.eco-baseline-bridge {
border: 1px solid #d8dee4;
border-radius: 0.9rem;
padding: 1rem 1.1rem 1.15rem;
background: linear-gradient(180deg, #ffffff 0%, #f8fafc 100%);
box-shadow: 0 10px 24px rgba(15, 23, 42, 0.06);
margin-bottom: 1.25rem;
}
.eco-baseline-bridge h3 {
margin-top: 0;
margin-bottom: 0.45rem;
}
.eco-diversity-grid {
display: grid;
gap: 1.25rem;
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
}
.eco-card {
border: 1px solid #d8dee4;
border-radius: 0.9rem;
padding: 1rem 1.1rem 1.2rem;
background: #fff;
box-shadow: 0 10px 24px rgba(15, 23, 42, 0.06);
min-width: 0;
}
.eco-card h3 {
margin-top: 0;
margin-bottom: 0.5rem;
}
.eco-control-grid {
display: grid;
gap: 0.9rem;
margin: 1rem 0;
}
.eco-control-grid label,
.eco-select-label {
display: grid;
gap: 0.35rem;
font-weight: 600;
}
.eco-control-grid input[type="range"] {
width: 100%;
}
.eco-control-value {
font-size: 0.82rem;
color: #475569;
font-weight: 500;
}
.eco-select-label select {
max-width: 13rem;
}
.eco-metric-grid {
display: grid;
gap: 0.75rem;
grid-template-columns: repeat(auto-fit, minmax(110px, 1fr));
margin-top: 1rem;
}
.eco-metric-card {
border-radius: 0.8rem;
background: #f8fafc;
padding: 0.75rem 0.85rem;
border: 1px solid #e2e8f0;
}
.eco-metric-label {
display: block;
font-size: 0.8rem;
color: #475569;
}
.eco-metric-value {
display: block;
font-size: 1.15rem;
font-weight: 700;
color: #0f172a;
}
.eco-chart-frame {
border: 1px solid #e2e8f0;
border-radius: 0.8rem;
padding: 0.75rem;
background: linear-gradient(180deg, #ffffff 0%, #f8fafc 100%);
max-width: 32rem;
margin: 0 auto;
}
#alpha-composition-chart svg,
#eco-baseline-chart svg,
#beta-ordination-chart svg {
display: block;
width: 100%;
height: auto;
max-width: 100%;
margin: 0 auto;
}
.eco-beta-summary {
margin: 0.8rem 0 0;
color: #334155;
}
.eco-inline-legend {
display: flex;
flex-wrap: wrap;
gap: 0.8rem 1rem;
margin-top: 0.75rem;
color: #475569;
font-size: 0.9rem;
}
.eco-legend-item {
display: inline-flex;
align-items: center;
gap: 0.45rem;
}
.eco-legend-swatch {
width: 0.85rem;
height: 0.85rem;
display: inline-block;
border: 2px solid currentColor;
background: currentColor;
}
.eco-legend-swatch--baseline {
color: #64748b;
border-radius: 0.12rem;
transform: rotate(45deg);
}
.eco-legend-swatch--control {
color: #1976d2;
border-radius: 999px;
}
.eco-legend-swatch--intervention {
color: #c62828;
border-radius: 0.15rem;
}
.eco-caption {
font-size: 0.9rem;
color: #475569;
}
</style>
<div id="ecological-diversity-lab"></div>
<script>
document.addEventListener("DOMContentLoaded", function () {
const diversityLab = document.getElementById("ecological-diversity-lab");
diversityLab.innerHTML = `
<section class="eco-baseline-bridge" aria-labelledby="eco-baseline-heading">
<h3 id="eco-baseline-heading">Linked alpha-beta source sample</h3>
<p class="eco-caption">The alpha controls now build the exact source sample used by the beta-diversity panel. In return, the beta shift control updates the intervention-like preview back inside the alpha panel, so both graphics stay linked instead of behaving like separate demos.</p>
<div id="eco-baseline-chart" class="eco-chart-frame" aria-live="polite"></div>
<p id="eco-link-summary" class="eco-beta-summary" aria-live="polite"></p>
</section>
<div class="eco-diversity-grid">
<section class="eco-card" aria-labelledby="alpha-diversity-heading">
<h3 id="alpha-diversity-heading">Alpha diversity explorer</h3>
<p class="eco-caption">Change the source sample richness and dominance, then choose which alpha-diversity metric to emphasize. The intervention-like preview below is driven by the same beta shift strength used in the ordination panel.</p>
<div class="eco-control-grid">
<label for="alpha-richness">
Observed taxa
<span id="alpha-richness-display" class="eco-control-value"></span>
<input id="alpha-richness" type="range" min="3" max="12" step="1" value="6">
</label>
<label for="alpha-dominance">
Dominance of the most abundant taxon
<span id="alpha-dominance-display" class="eco-control-value"></span>
<input id="alpha-dominance" type="range" min="0" max="0.85" step="0.01" value="0.35">
</label>
<label class="eco-select-label" for="alpha-focus-metric">
Alpha metric to emphasize
<select id="alpha-focus-metric">
<option value="shannon" selected>Shannon diversity</option>
<option value="simpson">Simpson diversity</option>
<option value="inverse">Inverse Simpson</option>
<option value="evenness">Pielou evenness</option>
</select>
</label>
</div>
<div id="alpha-composition-chart" class="eco-chart-frame" aria-live="polite"></div>
<div class="eco-metric-grid" aria-live="polite">
<div class="eco-metric-card">
<span class="eco-metric-label">Richness</span>
<span class="eco-metric-value" id="alpha-richness-value"></span>
</div>
<div class="eco-metric-card">
<span class="eco-metric-label">Shannon</span>
<span class="eco-metric-value" id="alpha-shannon-value"></span>
</div>
<div class="eco-metric-card">
<span class="eco-metric-label">Simpson</span>
<span class="eco-metric-value" id="alpha-simpson-value"></span>
</div>
<div class="eco-metric-card">
<span class="eco-metric-label">Inverse Simpson</span>
<span class="eco-metric-value" id="alpha-inverse-value"></span>
</div>
<div class="eco-metric-card">
<span class="eco-metric-label">Pielou evenness</span>
<span class="eco-metric-value" id="alpha-evenness-value"></span>
</div>
</div>
<p id="alpha-link-summary" class="eco-beta-summary" aria-live="polite"></p>
</section>
<section class="eco-card" aria-labelledby="beta-diversity-heading">
<h3 id="beta-diversity-heading">Beta diversity explorer</h3>
<p class="eco-caption">Use the same alpha-defined source sample, then control how far the intervention group moves, how dispersed each group is, how many samples to simulate, and how Jaccard presence/absence is thresholded.</p>
<div class="eco-control-grid">
<label for="beta-shift">
Intervention shift strength
<span id="beta-shift-display" class="eco-control-value"></span>
<input id="beta-shift" type="range" min="0" max="0.55" step="0.01" value="0.22">
</label>
<label for="beta-variability">
Within-group variability
<span id="beta-variability-display" class="eco-control-value"></span>
<input id="beta-variability" type="range" min="0.01" max="0.18" step="0.01" value="0.06">
</label>
<label for="beta-samples">
Samples per group
<span id="beta-samples-display" class="eco-control-value"></span>
<input id="beta-samples" type="range" min="2" max="5" step="1" value="3">
</label>
<label for="beta-presence-threshold">
Jaccard presence threshold
<span id="beta-presence-display" class="eco-control-value"></span>
<input id="beta-presence-threshold" type="range" min="0.02" max="0.20" step="0.01" value="0.08">
</label>
<label class="eco-select-label" for="beta-metric">
Distance metric
<select id="beta-metric">
<option value="bray">Bray-Curtis</option>
<option value="jaccard">Jaccard</option>
<option value="aitchison">Aitchison-like CLR Euclidean</option>
</select>
</label>
<label class="eco-select-label" for="beta-ordination">
Ordination and visualization
<select id="beta-ordination">
<option value="pcoa">PCoA</option>
<option value="nmds">NMDS</option>
<option value="tsne">t-SNE</option>
<option value="umap">UMAP</option>
</select>
</label>
</div>
<div id="beta-ordination-chart" class="eco-chart-frame"></div>
<div class="eco-inline-legend" aria-hidden="true">
<span class="eco-legend-item"><span class="eco-legend-swatch eco-legend-swatch--baseline"></span>Source sample</span>
<span class="eco-legend-item"><span class="eco-legend-swatch eco-legend-swatch--control"></span>Control samples</span>
<span class="eco-legend-item"><span class="eco-legend-swatch eco-legend-swatch--intervention"></span>Intervention samples</span>
</div>
<p id="beta-diversity-summary" class="eco-beta-summary" aria-live="polite"></p>
</section>
</div>
`;
const baselineChart = document.getElementById("eco-baseline-chart");
const linkSummary = document.getElementById("eco-link-summary");
const alphaRichnessInput = document.getElementById("alpha-richness");
const alphaDominanceInput = document.getElementById("alpha-dominance");
const alphaFocusMetricInput = document.getElementById("alpha-focus-metric");
const alphaRichnessDisplay = document.getElementById("alpha-richness-display");
const alphaDominanceDisplay = document.getElementById("alpha-dominance-display");
const alphaCompositionChart = document.getElementById("alpha-composition-chart");
const alphaRichnessValue = document.getElementById("alpha-richness-value");
const alphaShannonValue = document.getElementById("alpha-shannon-value");
const alphaSimpsonValue = document.getElementById("alpha-simpson-value");
const alphaInverseValue = document.getElementById("alpha-inverse-value");
const alphaEvennessValue = document.getElementById("alpha-evenness-value");
const alphaLinkSummary = document.getElementById("alpha-link-summary");
const betaShiftInput = document.getElementById("beta-shift");
const betaVariabilityInput = document.getElementById("beta-variability");
const betaSamplesInput = document.getElementById("beta-samples");
const betaPresenceInput = document.getElementById("beta-presence-threshold");
const betaShiftDisplay = document.getElementById("beta-shift-display");
const betaVariabilityDisplay = document.getElementById("beta-variability-display");
const betaSamplesDisplay = document.getElementById("beta-samples-display");
const betaPresenceDisplay = document.getElementById("beta-presence-display");
const betaMetricInput = document.getElementById("beta-metric");
const betaOrdinationInput = document.getElementById("beta-ordination");
const betaOrdinationChart = document.getElementById("beta-ordination-chart");
const betaSummary = document.getElementById("beta-diversity-summary");
const palette = [
"#1976d2", "#2e7d32", "#c62828", "#7e57c2", "#ef6c00", "#00897b",
"#5d4037", "#546e7a", "#f9a825", "#8e24aa", "#3949ab", "#6d4c41"
];
const AITCHISON_DISPLAY_SCALE = 5;
const COORDINATE_JITTER = 0.02;
const NMDS_TARGET_OFFSET = 0.25;
const NMDS_TARGET_SPAN = 1.15;
const metricLabels = {
bray: "Bray-Curtis",
jaccard: "Jaccard",
aitchison: "Aitchison-like CLR Euclidean"
};
const ordinationLabels = {
pcoa: "PCoA",
nmds: "NMDS",
tsne: "t-SNE",
umap: "UMAP"
};
const ordinationDescriptions = {
pcoa: "PCoA keeps the selected distance metric as faithful as possible in two dimensions.",
nmds: "NMDS emphasizes the rank-order of dissimilarities, so relative separation matters more than exact geometry.",
tsne: "t-SNE emphasizes local neighborhoods; compact clusters are visual hints rather than effect-size estimates.",
umap: "UMAP balances local neighborhoods with broader structure, which often keeps group trends visible while preserving nearby samples."
};
const focusMetricLabels = {
shannon: "Shannon diversity",
simpson: "Simpson diversity",
inverse: "Inverse Simpson",
evenness: "Pielou evenness"
};
const formatNumber = (value) => Number(value).toFixed(2);
const shannon = (proportions) =>
-proportions.reduce((sum, value) => sum + (value > 0 ? value * Math.log(value) : 0), 0);
const simpson = (proportions) =>
1 - proportions.reduce((sum, value) => sum + value * value, 0);
function inverseSimpson(proportions) {
const denominator = proportions.reduce((sum, value) => sum + value * value, 0);
return denominator > 0 ? 1 / denominator : 0;
}
function getLabelColor(hexColor) {
if (!hexColor || typeof hexColor !== "string") {
return "white";
}
const hex = hexColor.replace("#", "");
if (hex.length !== 6) {
return "white";
}
const r = parseInt(hex.slice(0, 2), 16);
const g = parseInt(hex.slice(2, 4), 16);
const b = parseInt(hex.slice(4, 6), 16);
const brightness = (r * 299 + g * 587 + b * 114) / 1000;
return brightness > 150 ? "#111827" : "white";
}
function alphaComposition(richness, dominance) {
const maxShare = (1 / richness) + dominance * (1 - 1 / richness);
const otherShare = (1 - maxShare) / (richness - 1);
return Array.from({ length: richness }, (_, index) => index === 0 ? maxShare : otherShare);
}
function normalize(values) {
const total = values.reduce((sum, value) => sum + value, 0);
return values.map((value) => value / total);
}
function currentSourceProfile() {
return alphaComposition(Number(alphaRichnessInput.value), Number(alphaDominanceInput.value));
}
function stackedBarSegments(values, options = {}) {
const {
x = 0,
y = 0,
width = 100,
height = 16,
labelPrefix = "T"
} = options;
let offset = x;
return values.map((value, index) => {
const segmentWidth = value * width;
const segmentColor = palette[index % palette.length];
const rect = `<rect x="${offset}" y="${y}" width="${segmentWidth}" height="${height}" rx="4" ry="4" fill="${segmentColor}"></rect>`;
const labelColor = getLabelColor(segmentColor);
const label = segmentWidth > 13
? `<text x="${offset + segmentWidth / 2}" y="${y + height / 2 + 3}" text-anchor="middle" font-size="8" fill="${labelColor}" font-weight="700">${labelPrefix}${index + 1}</text>`
: "";
offset += segmentWidth;
return rect + label;
}).join("");
}
function metricsFor(proportions) {
const richness = proportions.length;
const alphaShannon = shannon(proportions);
return {
richness,
shannon: alphaShannon,
simpson: simpson(proportions),
inverse: inverseSimpson(proportions),
evenness: richness > 1 ? alphaShannon / Math.log(richness) : 0
};
}
function metricValue(metrics, key) {
if (key === "richness") {
return metrics.richness;
}
return metrics[key];
}
function buildInterventionSignal(length) {
const raw = Array.from({ length }, (_, index) => {
const fraction = length === 1 ? 0 : index / (length - 1);
if (fraction < 0.34) {
return -0.15 + 0.05 * fraction;
}
if (fraction > 0.66) {
return 0.12 + 0.05 * (fraction - 0.66);
}
return 0.03 * (fraction - 0.5);
});
const mean = raw.reduce((sum, value) => sum + value, 0) / raw.length;
return raw.map((value) => value - mean);
}
function patternedOffset(length, sampleIndex, amplitude, phase) {
const raw = Array.from({ length }, (_, index) =>
Math.sin((index + 1 + phase) * (sampleIndex + 1) * 0.9) +
0.45 * Math.cos((index + 1) * 1.7 + phase)
);
const mean = raw.reduce((sum, value) => sum + value, 0) / raw.length;
return raw.map((value) => (value - mean) * amplitude / length);
}
function shiftedProfile(base, signal, strength, offset) {
return normalize(base.map((value, index) =>
Math.max(0.005, value + signal[index] * strength + offset[index])
));
}
function interventionPreviewProfile(baseProfile) {
const shiftStrength = Number(betaShiftInput.value);
const variability = Number(betaVariabilityInput.value);
return shiftedProfile(
baseProfile,
buildInterventionSignal(baseProfile.length),
shiftStrength,
patternedOffset(baseProfile.length, 0, variability * 0.7, 1.1)
);
}
function buildBetaSamples(baseProfile, shiftStrength, variability, sampleCount) {
const interventionSignal = buildInterventionSignal(baseProfile.length);
const controlSignal = interventionSignal.map((value) => value * 0.14);
const samples = [{ name: "Source", group: "Baseline", values: baseProfile }];
for (let index = 0; index < sampleCount; index += 1) {
samples.push({
name: `Control ${index + 1}`,
group: "Control",
values: shiftedProfile(
baseProfile,
controlSignal,
variability * 0.45,
patternedOffset(baseProfile.length, index, 0.02 + variability * 0.42, 0.4)
)
});
}
for (let index = 0; index < sampleCount; index += 1) {
samples.push({
name: `Intervention ${index + 1}`,
group: "Intervention",
values: shiftedProfile(
baseProfile,
interventionSignal,
shiftStrength,
patternedOffset(baseProfile.length, index, 0.025 + variability * 0.5, 1.6)
)
});
}
return samples;
}
function brayCurtis(a, b) {
let numerator = 0;
let denominator = 0;
for (let i = 0; i < a.length; i += 1) {
numerator += Math.abs(a[i] - b[i]);
denominator += a[i] + b[i];
}
return denominator === 0 ? 0 : numerator / denominator;
}
function jaccard(a, b, presenceThreshold) {
let intersection = 0;
let union = 0;
for (let i = 0; i < a.length; i += 1) {
const presentA = a[i] > presenceThreshold ? 1 : 0;
const presentB = b[i] > presenceThreshold ? 1 : 0;
if (presentA || presentB) {
union += 1;
}
if (presentA && presentB) {
intersection += 1;
}
}
return union === 0 ? 0 : 1 - (intersection / union);
}
function clrEuclidean(a, b) {
const pseudo = 1e-6;
const clr = (values) => {
const logs = values.map((value) => Math.log(value + pseudo));
const meanLog = logs.reduce((sum, value) => sum + value, 0) / logs.length;
return logs.map((value) => value - meanLog);
};
const clrA = clr(a);
const clrB = clr(b);
const squared = clrA.reduce((sum, value, index) => sum + Math.pow(value - clrB[index], 2), 0);
return Math.sqrt(squared) / AITCHISON_DISPLAY_SCALE;
}
function betaDistance(a, b, metric, presenceThreshold) {
if (metric === "jaccard") {
return jaccard(a, b, presenceThreshold);
}
if (metric === "aitchison") {
return clrEuclidean(a, b);
}
return brayCurtis(a, b);
}
function identityMatrix(size) {
return Array.from({ length: size }, (_, rowIndex) =>
Array.from({ length: size }, (_, colIndex) => rowIndex === colIndex ? 1 : 0)
);
}
function jacobiEigenDecomposition(matrix, maxIterations = 100) {
const size = matrix.length;
const a = matrix.map((row) => row.slice());
const vectors = identityMatrix(size);
for (let iteration = 0; iteration < maxIterations; iteration += 1) {
let p = 0;
let q = 1;
let maxValue = Math.abs(a[p][q]);
for (let rowIndex = 0; rowIndex < size; rowIndex += 1) {
for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) {
const candidate = Math.abs(a[rowIndex][colIndex]);
if (candidate > maxValue) {
maxValue = candidate;
p = rowIndex;
q = colIndex;
}
}
}
if (!Number.isFinite(maxValue) || maxValue < 1e-9) {
break;
}
const app = a[p][p];
const aqq = a[q][q];
const apq = a[p][q];
const angle = 0.5 * Math.atan2(2 * apq, aqq - app);
const cosine = Math.cos(angle);
const sine = Math.sin(angle);
for (let index = 0; index < size; index += 1) {
if (index !== p && index !== q) {
const aip = a[index][p];
const aiq = a[index][q];
a[index][p] = a[p][index] = cosine * aip - sine * aiq;
a[index][q] = a[q][index] = sine * aip + cosine * aiq;
}
}
a[p][p] = (cosine * cosine * app) - (2 * sine * cosine * apq) + (sine * sine * aqq);
a[q][q] = (sine * sine * app) + (2 * sine * cosine * apq) + (cosine * cosine * aqq);
a[p][q] = 0;
a[q][p] = 0;
for (let index = 0; index < size; index += 1) {
const vip = vectors[index][p];
const viq = vectors[index][q];
vectors[index][p] = cosine * vip - sine * viq;
vectors[index][q] = sine * vip + cosine * viq;
}
}
return Array.from({ length: size }, (_, index) => ({
value: a[index][index],
vector: vectors.map((row) => row[index])
})).sort((left, right) => right.value - left.value);
}
function centerCoords(coords) {
const meanX = coords.reduce((sum, point) => sum + point[0], 0) / coords.length;
const meanY = coords.reduce((sum, point) => sum + point[1], 0) / coords.length;
return coords.map((point) => [point[0] - meanX, point[1] - meanY]);
}
function normalizedCoords(coords) {
if (!coords.length) {
return [];
}
const centered = centerCoords(coords);
const maxAbs = Math.max(...centered.flat().map((value) => Math.abs(value)));
if (!Number.isFinite(maxAbs) || maxAbs < 1e-6) {
return coords.map((_, index) => {
const angle = (2 * Math.PI * index) / coords.length;
return [Math.cos(angle), Math.sin(angle)];
});
}
return centered.map((point, index) => [
point[0] / maxAbs + COORDINATE_JITTER * Math.cos(index),
point[1] / maxAbs + COORDINATE_JITTER * Math.sin(index)
]);
}
function distanceMatrix(samples, metric, presenceThreshold) {
return samples.map((rowSample, rowIndex) =>
samples.map((colSample, colIndex) =>
rowIndex === colIndex ? 0 : betaDistance(rowSample.values, colSample.values, metric, presenceThreshold)
)
);
}
function classicalMDS(distances) {
const size = distances.length;
const squared = distances.map((row) => row.map((value) => value * value));
const rowMeans = squared.map((row) => row.reduce((sum, value) => sum + value, 0) / size);
const columnMeans = Array.from({ length: size }, (_, colIndex) =>
squared.reduce((sum, row) => sum + row[colIndex], 0) / size
);
const grandMean = rowMeans.reduce((sum, value) => sum + value, 0) / size;
const centered = squared.map((row, rowIndex) =>
row.map((value, colIndex) => -0.5 * (value - rowMeans[rowIndex] - columnMeans[colIndex] + grandMean))
);
const eigenpairs = jacobiEigenDecomposition(centered)
.filter((pair) => pair.value > 1e-8)
.slice(0, 2);
if (!eigenpairs.length) {
return normalizedCoords(Array.from({ length: size }, () => [0, 0]));
}
const coords = Array.from({ length: size }, (_, rowIndex) =>
eigenpairs.map((pair) => pair.vector[rowIndex] * Math.sqrt(pair.value))
).map((point) => point.length === 2 ? point : [point[0], 0]);
return normalizedCoords(coords);
}
function rankDisparities(distances) {
const pairs = [];
for (let rowIndex = 0; rowIndex < distances.length; rowIndex += 1) {
for (let colIndex = rowIndex + 1; colIndex < distances.length; colIndex += 1) {
pairs.push({ rowIndex, colIndex, distance: distances[rowIndex][colIndex] });
}
}
pairs.sort((left, right) => left.distance - right.distance);
const denominator = Math.max(1, pairs.length - 1);
pairs.forEach((pair, index) => {
pair.target = NMDS_TARGET_OFFSET + (index / denominator) * NMDS_TARGET_SPAN;
});
return pairs;
}
function nmdsProjection(distances) {
let coords = classicalMDS(distances);
const pairs = rankDisparities(distances);
for (let iteration = 0; iteration < 240; iteration += 1) {
const gradients = coords.map(() => [0, 0]);
pairs.forEach((pair) => {
const pointA = coords[pair.rowIndex];
const pointB = coords[pair.colIndex];
const dx = pointA[0] - pointB[0];
const dy = pointA[1] - pointB[1];
const currentDistance = Math.sqrt((dx * dx) + (dy * dy)) + 1e-6;
const pull = 0.10 * (pair.target - currentDistance) / currentDistance;
gradients[pair.rowIndex][0] += pull * dx;
gradients[pair.rowIndex][1] += pull * dy;
gradients[pair.colIndex][0] -= pull * dx;
gradients[pair.colIndex][1] -= pull * dy;
});
coords = coords.map((point, index) => [
point[0] + gradients[index][0],
point[1] + gradients[index][1]
]);
coords = normalizedCoords(coords);
}
return normalizedCoords(coords);
}
function tsneProjection(distances) {
const size = distances.length;
const nonZeroDistances = distances.flat().filter((value) => value > 0);
const sigma = nonZeroDistances.length
? nonZeroDistances.reduce((sum, value) => sum + value, 0) / nonZeroDistances.length
: 1;
const affinities = Array.from({ length: size }, () => Array(size).fill(0));
let affinityTotal = 0;
for (let rowIndex = 0; rowIndex < size; rowIndex += 1) {
for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) {
const value = Math.exp(-(distances[rowIndex][colIndex] ** 2) / (2 * sigma * sigma + 1e-6));
affinities[rowIndex][colIndex] = value;
affinities[colIndex][rowIndex] = value;
affinityTotal += value * 2;
}
}
for (let rowIndex = 0; rowIndex < size; rowIndex += 1) {
for (let colIndex = 0; colIndex < size; colIndex += 1) {
affinities[rowIndex][colIndex] = affinityTotal > 0 ? affinities[rowIndex][colIndex] / affinityTotal : 0;
}
}
let coords = classicalMDS(distances).map((point, index) => [
point[0] + 0.04 * Math.cos(index * 1.7),
point[1] + 0.04 * Math.sin(index * 1.7)
]);
for (let iteration = 0; iteration < 220; iteration += 1) {
const numerators = Array.from({ length: size }, () => Array(size).fill(0));
let qTotal = 0;
for (let rowIndex = 0; rowIndex < size; rowIndex += 1) {
for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) {
const dx = coords[rowIndex][0] - coords[colIndex][0];
const dy = coords[rowIndex][1] - coords[colIndex][1];
const numerator = 1 / (1 + (dx * dx) + (dy * dy));
numerators[rowIndex][colIndex] = numerator;
numerators[colIndex][rowIndex] = numerator;
qTotal += numerator * 2;
}
}
const gradients = coords.map(() => [0, 0]);
for (let rowIndex = 0; rowIndex < size; rowIndex += 1) {
for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) {
const dx = coords[rowIndex][0] - coords[colIndex][0];
const dy = coords[rowIndex][1] - coords[colIndex][1];
const qValue = qTotal > 0 ? numerators[rowIndex][colIndex] / qTotal : 0;
const push = 4 * (affinities[rowIndex][colIndex] - qValue) * numerators[rowIndex][colIndex];
gradients[rowIndex][0] += push * dx;
gradients[rowIndex][1] += push * dy;
gradients[colIndex][0] -= push * dx;
gradients[colIndex][1] -= push * dy;
}
}
const learningRate = iteration < 80 ? 0.7 : 0.28;
coords = coords.map((point, index) => [
point[0] + learningRate * gradients[index][0],
point[1] + learningRate * gradients[index][1]
]);
coords = normalizedCoords(coords);
}
return normalizedCoords(coords);
}
function umapProjection(distances) {
const size = distances.length;
const nonZeroDistances = distances.flat().filter((value) => value > 0);
const scale = nonZeroDistances.length
? nonZeroDistances.reduce((sum, value) => sum + value, 0) / nonZeroDistances.length
: 1;
const weights = Array.from({ length: size }, () => Array(size).fill(0));
for (let rowIndex = 0; rowIndex < size; rowIndex += 1) {
const neighbors = distances[rowIndex]
.map((distance, colIndex) => ({ distance, colIndex }))
.filter((entry) => entry.colIndex !== rowIndex)
.sort((left, right) => left.distance - right.distance)
.slice(0, 3);
neighbors.forEach((neighbor) => {
const weight = Math.exp(-neighbor.distance / (scale + 1e-6));
weights[rowIndex][neighbor.colIndex] = Math.max(weights[rowIndex][neighbor.colIndex], weight);
weights[neighbor.colIndex][rowIndex] = Math.max(weights[neighbor.colIndex][rowIndex], weight);
});
}
let coords = classicalMDS(distances);
for (let iteration = 0; iteration < 220; iteration += 1) {
const forces = coords.map(() => [0, 0]);
for (let rowIndex = 0; rowIndex < size; rowIndex += 1) {
for (let colIndex = rowIndex + 1; colIndex < size; colIndex += 1) {
const dx = coords[rowIndex][0] - coords[colIndex][0];
const dy = coords[rowIndex][1] - coords[colIndex][1];
const currentDistance = Math.sqrt((dx * dx) + (dy * dy)) + 1e-6;
const weight = weights[rowIndex][colIndex];
if (weight > 0) {
const desired = 0.42 + (0.28 * (1 - weight));
const attraction = 0.16 * weight * (desired - currentDistance) / currentDistance;
forces[rowIndex][0] += attraction * dx;
forces[rowIndex][1] += attraction * dy;
forces[colIndex][0] -= attraction * dx;
forces[colIndex][1] -= attraction * dy;
}
const repulsion = 0.018 / (0.18 + (currentDistance * currentDistance));
forces[rowIndex][0] += repulsion * dx;
forces[rowIndex][1] += repulsion * dy;
forces[colIndex][0] -= repulsion * dx;
forces[colIndex][1] -= repulsion * dy;
}
}
coords = coords.map((point, index) => [
point[0] + forces[index][0],
point[1] + forces[index][1]
]);
coords = normalizedCoords(coords);
}
return normalizedCoords(coords);
}
function ordinationCoords(distances, ordination) {
if (ordination === "nmds") {
return nmdsProjection(distances);
}
if (ordination === "tsne") {
return tsneProjection(distances);
}
if (ordination === "umap") {
return umapProjection(distances);
}
return classicalMDS(distances);
}
function projectToCanvas(coords, width, height, padding) {
const xs = coords.map((point) => point[0]);
const ys = coords.map((point) => point[1]);
const xMin = Math.min(...xs);
const xMax = Math.max(...xs);
const yMin = Math.min(...ys);
const yMax = Math.max(...ys);
const xSpan = Math.max(1e-6, xMax - xMin);
const ySpan = Math.max(1e-6, yMax - yMin);
return coords.map((point) => ({
x: padding + ((point[0] - xMin) / xSpan) * (width - 2 * padding),
y: height - padding - ((point[1] - yMin) / ySpan) * (height - 2 * padding)
}));
}
function sampleMarker(sample, point, pointColor) {
if (sample.group === "Baseline") {
return `<path d="M ${point.x} ${point.y - 8} L ${point.x + 8} ${point.y} L ${point.x} ${point.y + 8} L ${point.x - 8} ${point.y} Z" fill="white" stroke="${pointColor}" stroke-width="2"></path>`;
}
if (sample.group === "Intervention") {
return `<rect x="${point.x - 6}" y="${point.y - 6}" width="12" height="12" rx="2" ry="2" fill="white" stroke="${pointColor}" stroke-width="2"></rect>`;
}
return `<circle cx="${point.x}" cy="${point.y}" r="6" fill="white" stroke="${pointColor}" stroke-width="2"></circle>`;
}
function renderSharedBaseline() {
const baseProfile = currentSourceProfile();
const sampleCount = Number(betaSamplesInput.value);
baselineChart.innerHTML = `
<svg viewBox="0 0 320 124" width="100%" role="img" aria-label="Shared source community that connects the alpha and beta diversity examples">
<text x="30" y="16" font-size="10" fill="#334155" font-weight="700">Current source sample (T1-T${baseProfile.length})</text>
${stackedBarSegments(baseProfile, { x: 30, y: 24, width: 260, height: 20 })}
<text x="30" y="58" font-size="8.5" fill="#475569">This single composition now feeds both panels below.</text>
<line x1="160" y1="48" x2="160" y2="70" stroke="#94a3b8" stroke-width="1.4"></line>
<line x1="160" y1="70" x2="82" y2="70" stroke="#94a3b8" stroke-width="1.4"></line>
<line x1="160" y1="70" x2="238" y2="70" stroke="#94a3b8" stroke-width="1.4"></line>
<rect x="18" y="72" width="128" height="34" rx="8" ry="8" fill="#eff6ff" stroke="#bfdbfe"></rect>
<rect x="174" y="72" width="128" height="34" rx="8" ry="8" fill="#fef2f2" stroke="#fecaca"></rect>
<text x="82" y="88" text-anchor="middle" font-size="8.5" fill="#1e3a8a" font-weight="700">Alpha diversity</text>
<text x="82" y="99" text-anchor="middle" font-size="7.5" fill="#1f2937">One sample → richness + evenness</text>
<text x="238" y="88" text-anchor="middle" font-size="8.5" fill="#991b1b" font-weight="700">Beta diversity</text>
<text x="238" y="99" text-anchor="middle" font-size="7.5" fill="#1f2937">${sampleCount} control + ${sampleCount} intervention samples</text>
</svg>
`;
linkSummary.textContent = `Right now the alpha controls define a ${baseProfile.length}-taxon source sample that becomes the anchor for every point in the beta-diversity ordination below.`;
}
function renderAlpha() {
const baseProfile = currentSourceProfile();
const previewProfile = interventionPreviewProfile(baseProfile);
const baseMetrics = metricsFor(baseProfile);
const previewMetrics = metricsFor(previewProfile);
const focusMetric = alphaFocusMetricInput.value;
alphaRichnessDisplay.textContent = `${alphaRichnessInput.value} taxa`;
alphaDominanceDisplay.textContent = `${formatNumber(alphaDominanceInput.value)} of the sample in T1`;
alphaRichnessValue.textContent = String(baseMetrics.richness);
alphaShannonValue.textContent = formatNumber(baseMetrics.shannon);
alphaSimpsonValue.textContent = formatNumber(baseMetrics.simpson);
alphaInverseValue.textContent = formatNumber(baseMetrics.inverse);
alphaEvennessValue.textContent = formatNumber(baseMetrics.evenness);
alphaCompositionChart.innerHTML = `
<svg viewBox="0 0 100 110" width="100%" role="img" aria-label="Current source sample and intervention-like preview for the alpha-diversity example">
<text x="4" y="12" font-size="7.5" fill="#334155">Current shared source sample</text>
${stackedBarSegments(baseProfile, { x: 4, y: 18, width: 92, height: 18 })}
<line x1="50" y1="37" x2="50" y2="48" stroke="#94a3b8" stroke-width="0.7" stroke-dasharray="2 2"></line>
<text x="4" y="57" font-size="7.5" fill="#334155">Intervention-like preview from beta shift</text>
${stackedBarSegments(previewProfile, { x: 4, y: 63, width: 92, height: 18 })}
<text x="4" y="94" font-size="7" fill="#475569">The beta shift slider now changes this preview too, so the alpha and beta examples stay connected.</text>
</svg>
`;
alphaLinkSummary.textContent = `${focusMetricLabels[focusMetric]} changes from ${formatNumber(metricValue(baseMetrics, focusMetric))} in the shared source sample to ${formatNumber(metricValue(previewMetrics, focusMetric))} in the intervention-like preview generated by the current beta settings.`;
}
function renderBeta() {
const baseProfile = currentSourceProfile();
const shiftStrength = Number(betaShiftInput.value);
const variability = Number(betaVariabilityInput.value);
const sampleCount = Number(betaSamplesInput.value);
const presenceThreshold = Number(betaPresenceInput.value);
const metric = betaMetricInput.value;
const ordination = betaOrdinationInput.value;
const samples = buildBetaSamples(baseProfile, shiftStrength, variability, sampleCount);
const distances = distanceMatrix(samples, metric, presenceThreshold);
const projectedPoints = projectToCanvas(ordinationCoords(distances, ordination), 340, 235, 36);
const baselinePoint = projectedPoints[0];
const controlPoints = projectedPoints.filter((_, index) => samples[index].group === "Control");
const interventionPoints = projectedPoints.filter((_, index) => samples[index].group === "Intervention");
const controlCentroid = {
x: controlPoints.reduce((sum, point) => sum + point.x, 0) / controlPoints.length,
y: controlPoints.reduce((sum, point) => sum + point.y, 0) / controlPoints.length
};
const interventionCentroid = {
x: interventionPoints.reduce((sum, point) => sum + point.x, 0) / interventionPoints.length,
y: interventionPoints.reduce((sum, point) => sum + point.y, 0) / interventionPoints.length
};
betaShiftDisplay.textContent = formatNumber(shiftStrength);
betaVariabilityDisplay.textContent = formatNumber(variability);
betaSamplesDisplay.textContent = `${sampleCount} samples`;
betaPresenceDisplay.textContent = formatNumber(presenceThreshold);
const trajectoryLines = projectedPoints.slice(1).map((point, index) => {
const sample = samples[index + 1];
const stroke = sample.group === "Control" ? "#93c5fd" : "#fca5a5";
return `<line x1="${baselinePoint.x}" y1="${baselinePoint.y}" x2="${point.x}" y2="${point.y}" stroke="${stroke}" stroke-width="1.2" stroke-dasharray="3 3"></line>`;
}).join("");
const axisX = `<line x1="36" y1="199" x2="304" y2="199" stroke="#cbd5e1" stroke-width="1"></line>`;
const axisY = `<line x1="36" y1="199" x2="36" y2="30" stroke="#cbd5e1" stroke-width="1"></line>`;
const centroidAnnotations = `
<line x1="${baselinePoint.x}" y1="${baselinePoint.y}" x2="${controlCentroid.x}" y2="${controlCentroid.y}" stroke="#1976d2" stroke-width="1.8"></line>
<line x1="${baselinePoint.x}" y1="${baselinePoint.y}" x2="${interventionCentroid.x}" y2="${interventionCentroid.y}" stroke="#c62828" stroke-width="1.8"></line>
<text x="${controlCentroid.x + 8}" y="${controlCentroid.y - 8}" font-size="8" fill="#0f172a">Control centroid</text>
<text x="${interventionCentroid.x + 8}" y="${interventionCentroid.y - 8}" font-size="8" fill="#0f172a">Intervention centroid</text>
`;
const points = samples.map((sample, index) => {
const point = projectedPoints[index];
const pointColor = sample.group === "Baseline"
? "#64748b"
: sample.group === "Control"
? "#1976d2"
: "#c62828";
return `
${sampleMarker(sample, point, pointColor)}
<text x="${point.x + 9}" y="${point.y - 9}" font-size="7.6" fill="#334155">${sample.name}</text>
`;
}).join("");
betaOrdinationChart.innerHTML = `
<svg viewBox="0 0 340 235" width="100%" role="img" aria-label="High-dimensional beta diversity ordination with source, control, and intervention samples">
<text x="12" y="16" font-size="10" fill="#334155" font-weight="700">${ordinationLabels[ordination]} view of the linked alpha-defined source sample</text>
<text x="12" y="30" font-size="8" fill="#475569">Distance metric: ${metricLabels[metric]}. Dashed trajectories connect the shared source sample to each simulated cohort sample.</text>
${axisX}
${axisY}
<text x="296" y="214" font-size="8" fill="#475569">${ordinationLabels[ordination]} 1</text>
<text x="10" y="42" font-size="8" fill="#475569" transform="rotate(-90 10 42)">${ordinationLabels[ordination]} 2</text>
${trajectoryLines}
${centroidAnnotations}
${points}
</svg>
`;
const nonBaselineSamples = samples.filter((sample) => sample.group !== "Baseline");
const nonBaselineDistances = distanceMatrix(nonBaselineSamples, metric, presenceThreshold);
const within = [];
const between = [];
nonBaselineSamples.forEach((rowSample, rowIndex) => {
nonBaselineSamples.forEach((colSample, colIndex) => {
if (rowIndex < colIndex) {
const value = nonBaselineDistances[rowIndex][colIndex];
if (rowSample.group === colSample.group) {
within.push(value);
} else {
between.push(value);
}
}
});
});
const meanWithin = within.reduce((sum, value) => sum + value, 0) / within.length;
const meanBetween = between.reduce((sum, value) => sum + value, 0) / between.length;
const thresholdNote = metric === "jaccard"
? ` Presence threshold = ${formatNumber(presenceThreshold)}.`
: "";
betaSummary.textContent = `${metricLabels[metric]} + ${ordinationLabels[ordination]} using the current alpha-defined source sample: mean within-group distance = ${formatNumber(meanWithin)} and mean between-group distance = ${formatNumber(meanBetween)}.${thresholdNote} ${ordinationDescriptions[ordination]}`;
}
let scheduledRenderFrame = null;
function renderAll() {
renderSharedBaseline();
renderAlpha();
renderBeta();
}
function scheduleRenderAll() {
if (scheduledRenderFrame !== null) {
window.cancelAnimationFrame(scheduledRenderFrame);
}
scheduledRenderFrame = window.requestAnimationFrame(() => {
scheduledRenderFrame = null;
renderAll();
});
}
function determineInputEventType(input) {
return input instanceof HTMLSelectElement ? "change" : "input";
}
[
alphaRichnessInput,
alphaDominanceInput,
alphaFocusMetricInput,
betaShiftInput,
betaVariabilityInput,
betaSamplesInput,
betaPresenceInput
].forEach((input) => {
input.addEventListener(determineInputEventType(input), scheduleRenderAll);
});
betaMetricInput.addEventListener("change", scheduleRenderAll);
betaOrdinationInput.addEventListener("change", scheduleRenderAll);
renderAll();
});
</script>
:::
## What files to start from
A practical minimal input set is:
| File | Typical source | What it contains |
|---|---|---|
| `metaphlan_merged.tsv` | `merge_metaphlan_tables.py` | Taxa x samples relative abundance matrix |
| `pathabundance.tsv` or joined HUMAnN table | `humann_join_tables` | Pathways or gene families x samples |
| `metadata.tsv` | Your study design | Sample IDs, group labels, time points, etc. |
For diversity calculations you want a matrix with:
- **rows = samples**
- **columns = taxa or pathways**
- **values = non-negative abundances**
---
## R packages
```r
install.packages(c("vegan", "dplyr", "tidyr", "tibble", "ggplot2", "Rtsne", "uwot"))
```
```r
library(vegan)
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(Rtsne)
library(uwot)
```
---
## Step 1: Read a MetaPhlAn table
A merged MetaPhlAn table usually has one taxon per row and one sample per column. The example below keeps only **species-level rows**.
```r
metaphlan_raw <- read.delim("metaphlan_merged.tsv", check.names = FALSE, comment.char = "#")
metadata <- read.delim("metadata.tsv", check.names = FALSE)
species_mat <- metaphlan_raw |>
rename(feature = clade_name) |>
filter(grepl("\\|s__[^|]+$", feature)) |>
column_to_rownames("feature") |>
t() |>
as.data.frame() |>
rownames_to_column("sample_id") |>
left_join(metadata, by = "sample_id")
sample_info <- species_mat |> select(sample_id, group, timepoint)
abund <- species_mat |> select(-sample_id, -group, -timepoint) |> as.matrix()
mode(abund) <- "numeric"
rownames(abund) <- sample_info$sample_id
# MetaPhlAn is already relative abundance, but re-scaling is a safe habit.
abund_rel <- decostand(abund, method = "total")
```
::: {.callout-tip}
If you prefer **family-level** diversity, change the filter to `grepl("\\|f__[^|]+$", feature)` instead of species-level rows.
:::
### HUMAnN variant
For HUMAnN, use the same structure but usually remove **stratified rows** first so that each feature is a pathway or gene family rather than a pathway-with-species combination.
```r
humann_raw <- read.delim("pathabundance.tsv", check.names = FALSE)
pathway_mat <- humann_raw |>
rename(feature = `# Pathway`) |>
filter(!grepl("\\|", feature)) |>
column_to_rownames("feature") |>
t() |>
as.matrix()
mode(pathway_mat) <- "numeric"
pathway_rel <- decostand(pathway_mat, method = "total")
```
---
## Alpha diversity
**Alpha diversity** asks: *how diverse is each sample on its own?*
### The two basics: richness and evenness
- **Richness** = how many taxa/features are detected
- **Evenness** = how evenly abundance is spread across those taxa/features
In practice, beginners usually report richness and evenness alongside three very common alpha-diversity metrics:
1. **Shannon diversity**
2. **Simpson diversity**
3. **Inverse Simpson diversity**
### When to use each metric
| Metric | What it emphasizes | Intuition |
|---|---|---|
| **Richness** | Number of detected taxa | "How many things are here?" |
| **Pielou evenness** | Balance of abundances | "Are a few taxa dominating?" |
| **Shannon** | Both richness and evenness | Sensitive to common taxa, but still notices rare taxa |
| **Simpson** | Dominance | Drops when one or a few taxa take over |
| **Inverse Simpson** | Effective number of dominant taxa | Easier to explain to beginners than raw Simpson |
### Calculate alpha diversity with `vegan`
```r
alpha_div <- tibble(
sample_id = rownames(abund_rel),
richness = specnumber(abund_rel),
shannon = diversity(abund_rel, index = "shannon"),
simpson = diversity(abund_rel, index = "simpson"),
inverse_simpson = diversity(abund_rel, index = "invsimpson")
) |>
mutate(
pielou_evenness = if_else(richness > 1, shannon / log(richness), NA_real_)
) |>
left_join(sample_info, by = "sample_id")
alpha_div
```
### Quick interpretation guide
- **High richness + high evenness** → many taxa and none dominates too strongly
- **High richness + low evenness** → many taxa are present, but only a few are doing most of the work
- **Low Shannon** → either few taxa, or strong dominance by one/few taxa
- **Low Simpson / inverse Simpson** → especially strong dominance structure
### Plot alpha diversity
```r
ggplot(alpha_div, aes(x = group, y = shannon, fill = group)) +
geom_boxplot(width = 0.7, alpha = 0.85, outlier.shape = NA) +
geom_jitter(width = 0.12, size = 2, alpha = 0.8) +
labs(
title = "Shannon alpha diversity by group",
x = NULL,
y = "Shannon diversity"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")
```
### Statistical testing for alpha diversity
For two groups, a Wilcoxon rank-sum test is a common first pass. For more complex designs, use a linear model or mixed model.
```r
wilcox.test(shannon ~ group, data = alpha_div)
wilcox.test(inverse_simpson ~ group, data = alpha_div)
```
::: {.callout-warning}
Avoid over-interpreting tiny differences in alpha diversity. A statistically significant result can still be biologically small, especially when sequencing depth, filtering, or zero handling differs between groups.
:::
---
## Beta diversity
**Beta diversity** asks: *how different are samples from each other?*
The answer depends on the **dissimilarity metric** you choose.
### Three very common beta-diversity metrics
| Metric | `vegan` implementation | Best for | Notes |
|---|---|---|---|
| **Bray-Curtis** | `vegdist(..., method = "bray")` | Relative abundance data | Usually the first choice for MetaPhlAn/HUMAnN profiles |
| **Jaccard** | `vegdist(..., method = "jaccard")` after presence/absence conversion | Presence/absence questions | Ignores abundance magnitude |
| **Aitchison** | CLR transform + `dist()` | Compositional data | Very useful when you want a composition-aware analysis |
### Bray-Curtis dissimilarity
Bray-Curtis is popular because it reflects **abundance shifts** between samples.
```r
bray_dist <- vegdist(abund_rel, method = "bray")
```
### Jaccard dissimilarity
Jaccard is useful when you only care whether a taxon/pathway is present.
```r
abund_pa <- decostand(abund_rel, method = "pa")
jaccard_dist <- vegdist(abund_pa, method = "jaccard")
```
### Aitchison distance
Aitchison distance is built for **compositional data**. It requires a centered log-ratio (CLR) transform, which means adding a small pseudocount first.
```r
abund_clr <- decostand(abund_rel + 1e-06, method = "clr")
aitchison_dist <- dist(abund_clr)
```
::: {.callout-tip}
A simple beginner workflow is: **Bray-Curtis for your main ecological analysis**, then **Aitchison as a sensitivity check** if you want a composition-aware view.
:::
---
## Ordination and visualization
A distance matrix is hard to read directly, so you usually project it into 2 dimensions.
### A useful plotting helper
```r
plot_ordination <- function(df, x, y, title) {
ggplot(df, aes(x = .data[[x]], y = .data[[y]], color = group)) +
geom_point(size = 3, alpha = 0.9) +
labs(title = title, x = x, y = y, color = "Group") +
theme_minimal(base_size = 12)
}
```
### PCA
**PCA** works on a transformed abundance matrix, not directly on a distance matrix. For ecological data, a **Hellinger transform** is a good beginner-friendly choice.
```r
abund_hellinger <- decostand(abund_rel, method = "hellinger")
pca_fit <- prcomp(abund_hellinger, center = TRUE, scale. = FALSE)
pca_df <- as.data.frame(pca_fit$x[, 1:2]) |>
rownames_to_column("sample_id") |>
left_join(sample_info, by = "sample_id")
plot_ordination(pca_df, "PC1", "PC2", "PCA of Hellinger-transformed abundances")
```
### PCoA
**PCoA** is a natural companion to Bray-Curtis because it starts from a dissimilarity matrix.
```r
pcoa_fit <- cmdscale(bray_dist, eig = TRUE, k = 2)
pcoa_df <- as.data.frame(pcoa_fit$points) |>
setNames(c("PCoA1", "PCoA2")) |>
rownames_to_column("sample_id") |>
left_join(sample_info, by = "sample_id")
plot_ordination(pcoa_df, "PCoA1", "PCoA2", "PCoA of Bray-Curtis distances")
```
### NMDS
**NMDS** is one of the most widely used ordination methods in ecology because it focuses on preserving rank-order distances.
```r
set.seed(123)
nmds_fit <- metaMDS(abund_rel, distance = "bray", k = 2, trymax = 100)
nmds_df <- as.data.frame(scores(nmds_fit, display = "sites")) |>
rownames_to_column("sample_id") |>
left_join(sample_info, by = "sample_id")
plot_ordination(nmds_df, "NMDS1", "NMDS2", "NMDS on Bray-Curtis distances")
```
### t-SNE
**t-SNE** is mainly for visualization. It can reveal local neighborhood structure, but distances between distant clusters are not always easy to interpret biologically.
```r
set.seed(123)
tsne_fit <- Rtsne(abund_hellinger, dims = 2, perplexity = 10, check_duplicates = FALSE)
tsne_df <- as.data.frame(tsne_fit$Y) |>
setNames(c("tSNE1", "tSNE2")) |>
mutate(sample_id = rownames(abund_hellinger)) |>
left_join(sample_info, by = "sample_id")
plot_ordination(tsne_df, "tSNE1", "tSNE2", "t-SNE of Hellinger-transformed abundances")
```
### UMAP
**UMAP** is another visualization-first method that often gives cleaner-looking clusters than t-SNE while preserving more global structure.
```r
set.seed(123)
umap_fit <- uwot::umap(abund_hellinger, n_neighbors = 15, min_dist = 0.2)
umap_df <- as.data.frame(umap_fit) |>
setNames(c("UMAP1", "UMAP2")) |>
mutate(sample_id = rownames(abund_hellinger)) |>
left_join(sample_info, by = "sample_id")
plot_ordination(umap_df, "UMAP1", "UMAP2", "UMAP of Hellinger-transformed abundances")
```
### Which ordination should you show?
A sensible beginner combination is:
- **PCoA + Bray-Curtis** for the main ecological result
- **NMDS + Bray-Curtis** as a robust ecological cross-check
- **PCA** if you want a linear view of transformed abundances
- **t-SNE / UMAP** as exploratory visualizations, not your only evidence
---
## Statistical testing for beta diversity
Visualization is helpful, but it is not the statistical test.
### PERMANOVA
Use `adonis2()` to test whether groups differ in their community composition.
```r
adonis2(bray_dist ~ group, data = sample_info, permutations = 999)
```
### Check dispersion before over-interpreting PERMANOVA
PERMANOVA can be significant either because group centroids differ or because one group is much more dispersed.
```r
bray_dispersion <- betadisper(bray_dist, sample_info$group)
anova(bray_dispersion)
permutest(bray_dispersion)
```
::: {.callout-important}
A common reporting pattern is: **Bray-Curtis PCoA plot + PERMANOVA p-value + dispersion check**.
:::
---
## Recommended beginner workflow
1. Start with a **species-level MetaPhlAn** or **unstratified HUMAnN** table.
2. Re-scale to proportions with `decostand(..., "total")`.
3. Calculate **richness, Shannon, Simpson, inverse Simpson, and Pielou evenness**.
4. Use **Bray-Curtis** as your main beta-diversity metric.
5. Visualize beta diversity with **PCoA** and **NMDS**.
6. Use **PERMANOVA** to test group separation.
7. Re-run beta diversity with **Aitchison distance** if you want a composition-aware sensitivity analysis.
---
## Further reading
- [MetaPhlAn tutorial](../tools/metaphlan.qmd)
- [HUMAnN tutorial](../tools/humann.qmd)
- [QIIME 2 tutorial](../tools/qiime2.qmd) for amplicon-oriented diversity workflows
- `?vegan::diversity`, `?vegan::vegdist`, `?vegan::metaMDS`, `?vegan::adonis2`