Relative Abundance

Interactive longitudinal taxonomic bar charts with taxa-preserving significance overlays

This page shows how to build a family-level relative-abundance bar chart from MetaPhlAn output, while keeping the visual logic consistent with the intestinal phylogenetic tree navigator elsewhere on this site.

The main idea is:

keep families ordered by phylogenetic relatedness
keep colors anchored to the same phylum-level palette used in the tree navigator
show statistical significance in a second visual layer instead of overloading the bar colors

Note

For most beginner-friendly microbiome plots, family level is a good compromise: less sparse than species-level data, but still biologically interpretable.

The palette to reuse

The intestinal tree page already uses these bacterial phylum colors:

Phylum	Tree color
Actinomycetota	`#7e57c2`
Bacteroidota	`#1976d2`
Verrucomicrobiota	`#00897b`
Bacillota	`#2e7d32`
Pseudomonadota	`#c62828`
Fusobacteriota	`#ef6c00`

For family-level bars, it usually looks cleaner to use lighter shades within each phylum, which matches the tree’s logic that deeper ranks become lighter.

Interactive taxonomic bar-chart explorer

The interactive explorer below now uses a longitudinal control-versus-experimental design with triplicate subjects at time -3, Pre- (time 0), and time +4. You can change taxonomic level, choose a statistical contrast, inspect the sample-code key, and compare the stacked bars against a LEfSe-like significance strip while keeping the full framework table below.

My recommended visualization idea is the taxa-preserving significance strip:

the stacked bars keep taxa-specific color as the primary encoding
a second aligned panel shows LEfSe-like signed effect size for the selected longitudinal contrast
significant taxa keep their q labels, but taxa keep the same taxon color instead of switching to a significance palette

That way you get the best part of LEfSe-style interpretation without losing the phylogenetic color logic of the bar chart itself.

R packages

install.packages(c("dplyr", "tidyr", "tibble", "forcats", "ggplot2", "vegan", "scales"))

library(dplyr)
library(tidyr)
library(tibble)
library(forcats)
library(ggplot2)
library(vegan)
library(scales)

Step 1: Read a family-level MetaPhlAn table

metaphlan_raw <- read.delim("metaphlan_merged.tsv", check.names = FALSE, comment.char = "#")
metadata <- read.delim("metadata.tsv", check.names = FALSE)

family_long <- metaphlan_raw |>
  rename(feature = clade_name) |>
  filter(grepl("\\|f__[^|]+$", feature)) |>
  mutate(
    family = sub(".*\\|f__", "", feature),
    phylum = sub(".*\\|p__([^|]+).*", "\\1", feature),
    class = sub(".*\\|c__([^|]+).*", "\\1", feature),
    order = sub(".*\\|o__([^|]+).*", "\\1", feature)
  ) |>
  select(feature, phylum, class, order, family, everything()) |>
  pivot_longer(
    cols = -c(feature, phylum, class, order, family),
    names_to = "sample_id",
    values_to = "abundance"
  ) |>
  left_join(metadata, by = "sample_id")

If you have repeated rows at the same family level, aggregate them:

family_long <- family_long |>
  group_by(sample_id, group, timepoint, phylum, class, order, family) |>
  summarise(abundance = sum(abundance), .groups = "drop")

Step 2: Choose a phylogenetic order for families

To make the legend and stacked bars feel more biological, do not sort families only by abundance. Instead, keep closely related families adjacent.

A simple gut-oriented order that matches the intestinal tree page is:

phylo_order_tbl <- tibble(
  family = c(
    "Bifidobacteriaceae",
    "Bacteroidaceae", "Prevotellaceae", "Rikenellaceae", "Akkermansiaceae",
    "Lachnospiraceae", "Ruminococcaceae", "Peptostreptococcaceae",
    "Bacillaceae", "Staphylococcaceae", "Enterococcaceae",
    "Enterobacteriaceae", "Pseudomonadaceae", "Sutterellaceae",
    "Fusobacteriaceae"
  ),
  phylo_rank = seq_len(15)
)

If your study contains a different set of families, extend this table using the same logic:

sort by phylum first
then by class/order
then by family

That gives you a legend that reads like a simplified phylogenetic tree rather than a random color list.

Step 3: Keep the top families and lump the rest into “Other”

top_families <- family_long |>
  group_by(family) |>
  summarise(mean_abundance = mean(abundance), .groups = "drop") |>
  slice_max(mean_abundance, n = 15) |>
  pull(family)

family_plot <- family_long |>
  mutate(family = if_else(family %in% top_families, family, "Other")) |>
  group_by(sample_id, group, timepoint, phylum, class, order, family) |>
  summarise(abundance = sum(abundance), .groups = "drop")

Step 4: Build a tree-matched family palette

The phylum anchor colors below come directly from the tree navigator. Within each phylum, we create a short gradient so that related families share the same color family.

phylum_anchor_colors <- c(
  Actinomycetota = "#7e57c2",
  Bacteroidota = "#1976d2",
  Verrucomicrobiota = "#00897b",
  Bacillota = "#2e7d32",
  Pseudomonadota = "#c62828",
  Fusobacteriota = "#ef6c00"
)

phylum_light_colors <- c(
  Actinomycetota = "#f5f0fb",
  Bacteroidota = "#f2f8fe",
  Verrucomicrobiota = "#f0fbfa",
  Bacillota = "#f0f8e8",
  Pseudomonadota = "#fff5f6",
  Fusobacteriota = "#fff8ee"
)

family_key <- family_plot |>
  filter(family != "Other") |>
  distinct(phylum, class, order, family) |>
  left_join(phylo_order_tbl, by = "family") |>
  arrange(phylum, class, order, phylo_rank, family)

family_palette <- family_key |>
  group_by(phylum) |>
  group_modify(~ {
    shades <- colorRampPalette(c(
      phylum_anchor_colors[.y$phylum],
      phylum_light_colors[.y$phylum]
    ))(nrow(.x))

    mutate(.x, fill_color = shades)
  }) |>
  ungroup() |>
  select(family, fill_color) |>
  deframe()

family_palette <- c(family_palette, Other = "#d9d9d9")

This produces a palette where:

Bacteroidota families stay blue
Bacillota families stay green
Pseudomonadota families stay red
and so on

That way your stacked bar chart still communicates phylogeny at a glance.

Step 5: Apply the phylogenetic order to the legend and stacks

family_levels <- c(family_key$family, "Other")

family_plot <- family_plot |>
  mutate(
    family = factor(family, levels = family_levels),
    sample_id = factor(sample_id, levels = metadata$sample_id)
  )

Step 6: Draw the stacked bar chart

ggplot(family_plot, aes(x = sample_id, y = abundance, fill = family)) +
  geom_col(width = 0.9) +
  facet_grid(. ~ group, scales = "free_x", space = "free_x") +
  scale_fill_manual(values = family_palette, drop = FALSE) +
  scale_y_continuous(labels = label_percent(scale = 1)) +
  labs(
    title = "Family-level relative abundance",
    subtitle = "Families are ordered by phylogenetic relatedness and colored by the tree navigator palette",
    x = NULL,
    y = "Relative abundance (%)",
    fill = "Family"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
    panel.grid.major.x = element_blank(),
    legend.position = "right"
  )

Tip

If your study is longitudinal, put time on the x-axis and facet by subject or treatment group. The same color/order logic still works.

How to show statistical significance without breaking the color logic

If fill color already encodes phylogeny, avoid reusing the same fill for significance. That usually makes the plot harder to read.

A better approach is:

keep bar colors for taxonomy / phylogenetic relatedness
add a second panel for significance or effect size
optionally add *, **, or arrows in the legend labels or strip labels

Recommended option: a companion significance heatmap

Suppose you already ran a model such as MaAsLin2 or a paired longitudinal test and have a table with effect sizes and qval values.

stats_tbl <- read.delim("maaslin2/all_results.tsv", check.names = FALSE) |>
  filter(metadata == "timepoint") |>
  transmute(
    family = feature,
    effect = coef,
    qval = qval,
    sig_label = case_when(
      qval < 0.001 ~ "***",
      qval < 0.01 ~ "**",
      qval < 0.05 ~ "*",
      TRUE ~ ""
    )
  ) |>
  filter(family %in% levels(family_plot$family)) |>
  mutate(family = factor(family, levels = rev(levels(family_plot$family))))


ggplot(stats_tbl, aes(x = 1, y = family, fill = effect)) +
  geom_tile(color = "white") +
  geom_text(aes(label = sig_label), fontface = "bold", size = 4) +
  scale_fill_gradient2(low = "#2166ac", mid = "white", high = "#b2182b") +
  labs(
    title = "Companion significance panel",
    subtitle = "Tile color = model effect size, star = significance threshold",
    x = NULL,
    y = NULL,
    fill = "Effect"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

This gives you a clean split:

the stacked bar chart answers what is present, in what proportion, and how taxa are related
the heatmap answers which families differ significantly, and in which direction

Optional shortcut: add significance to legend labels

If you want something lighter-weight, append stars to the labels of significant families.

sig_families <- stats_tbl |>
  filter(qval < 0.05) |>
  pull(family)

legend_labels <- setNames(
  nm = levels(family_plot$family),
  object = ifelse(levels(family_plot$family) %in% sig_families,
                  paste0(levels(family_plot$family), " *"),
                  levels(family_plot$family))
)

Then pass labels = legend_labels into scale_fill_manual(). This is simple, but the side-panel heatmap is usually easier to interpret.

Recommended family-level plotting workflow

Start from a family-level MetaPhlAn table.
Join sample metadata.
Keep the top 10-20 families and lump the rest into Other.
Order families by phylum → class → order → family.
Use the same phylum color anchors as the intestinal tree navigator.
Use lighter within-phylum shades for family-level bars.
Show significance in a companion plot, not by replacing the phylogenetic fill colors.

Tutorial: generate the standard volcano plot

The interactive volcano plot above now uses a standard volcano layout:

compute a log2 fold change from the experimental-arm Pre- mean abundance to the time +4 mean abundance,
compute a Kruskal-Wallis p-value for the same Pre- vs time +4 comparison,
plot x = log2FC and y = -log10(p).

Because this page is a small synthetic teaching example, the browser-side code below focuses on the standard volcano geometry. A LEfSe-style follow-up comparison can be layered in later if we decide to add that analysis.

const SIGNIFICANCE_THRESHOLD = 0.05;
const MIN_DISPLAY_P_VALUE = 0.0005;
const VOLCANO_LOG2FC_PSEUDOCOUNT = 0.01;

function averageRanks(values) {
  const ranked = values
    .map((value, index) => ({ value, index }))
    .sort((left, right) => left.value - right.value);

  const ranks = new Array(values.length);
  let start = 0;
  while (start < ranked.length) {
    let end = start + 1;
    while (end < ranked.length && ranked[end].value === ranked[start].value) {
      end += 1;
    }
    const averageRank = (start + 1 + end) / 2;
    for (let rankIndex = start; rankIndex < end; rankIndex += 1) {
      ranks[ranked[rankIndex].index] = averageRank;
    }
    start = end;
  }
  return ranks;
}

function computeKruskalWallisPValue(groups) {
  const flattened = groups.flatMap((groupValues, groupIndex) =>
    groupValues.map((value) => ({ value, groupIndex }))
  );
  const ranks = averageRanks(flattened.map((entry) => entry.value));
  const totalCount = flattened.length;
  const tieCounts = flattened.reduce((counts, entry) => {
    counts[entry.value] = (counts[entry.value] || 0) + 1;
    return counts;
  }, {});
  const rankSums = groups.map(() => 0);

  flattened.forEach((entry, entryIndex) => {
    rankSums[entry.groupIndex] += ranks[entryIndex];
  });

  const statisticBase = groups.reduce((total, groupValues, groupIndex) =>
    total + ((rankSums[groupIndex] ** 2) / groupValues.length), 0
  );
  const statistic = ((12 / (totalCount * (totalCount + 1))) * statisticBase) - (3 * (totalCount + 1));
  const tieCorrectionDenominator = totalCount ** 3 - totalCount;
  const tieCorrection = tieCorrectionDenominator > 0
    ? 1 - (Object.values(tieCounts).reduce((total, count) => total + (count ** 3 - count), 0) / tieCorrectionDenominator)
    : 1;
  return Math.max(
    MIN_DISPLAY_P_VALUE,
    chiSquareSurvivalDf1(tieCorrection > Number.EPSILON ? statistic / tieCorrection : statistic)
  );
}

function computeLog2FoldChange(preValues, postValues) {
  if (preValues.length === 0 || postValues.length === 0) {
    return 0;
  }
  const preMean = mean(preValues);
  const postMean = mean(postValues);
  return Math.log2(
    (postMean + VOLCANO_LOG2FC_PSEUDOCOUNT) /
    (preMean + VOLCANO_LOG2FC_PSEUDOCOUNT)
  );
}

const volcanoPoints = level.taxa.map((taxon) => {
  const experimentalPreValues = valuesForTaxonSamples(taxon, "Experimental", "0");
  const experimentalPostValues = valuesForTaxonSamples(taxon, "Experimental", "+4");
  const pValue = computeKruskalWallisPValue([experimentalPreValues, experimentalPostValues]);
  const log2FoldChange = computeLog2FoldChange(experimentalPreValues, experimentalPostValues);

  return {
    taxon,
    x: log2FoldChange,
    y: -Math.log10(pValue)
  };
});

const circles = volcanoPoints.map((point) => `
  <circle
    cx="${xForLog2FoldChange(point.x)}"
    cy="${yForLogP(point.y)}"
    r="4.8"
    fill="${point.taxon.color}"
    stroke="${point.y >= -Math.log10(SIGNIFICANCE_THRESHOLD) ? "#0f172a" : "#94a3b8"}"
  ></circle>
`);

In the live figure:

horizontal position is the experimental-arm log2 fold change,
vertical position is the Kruskal-Wallis -log10(p) value,
the dashed horizontal guide marks the conventional p = 0.05 threshold.

--- title: "Relative Abundance" subtitle: "Interactive longitudinal taxonomic bar charts with taxa-preserving significance overlays" --- This page shows how to build a **family-level relative-abundance bar chart** from **MetaPhlAn** output, while keeping the visual logic consistent with the **intestinal phylogenetic tree navigator** elsewhere on this site. The main idea is: 1. keep families ordered by **phylogenetic relatedness** 2. keep colors anchored to the same **phylum-level palette** used in the tree navigator 3. show **statistical significance** in a second visual layer instead of overloading the bar colors ::: {.callout-note} For most beginner-friendly microbiome plots, **family level** is a good compromise: less sparse than species-level data, but still biologically interpretable. ::: --- ## The palette to reuse The intestinal tree page already uses these bacterial phylum colors: | Phylum | Tree color | |---|---| | Actinomycetota | `#7e57c2` | | Bacteroidota | `#1976d2` | | Verrucomicrobiota | `#00897b` | | Bacillota | `#2e7d32` | | Pseudomonadota | `#c62828` | | Fusobacteriota | `#ef6c00` | For family-level bars, it usually looks cleaner to use **lighter shades within each phylum**, which matches the tree's logic that deeper ranks become lighter. ## Interactive taxonomic bar-chart explorer The interactive explorer below now uses a **longitudinal control-versus-experimental design** with triplicate subjects at **time -3**, **Pre- (time 0)**, and **time +4**. You can change taxonomic level, choose a statistical contrast, inspect the sample-code key, and compare the stacked bars against a LEfSe-like significance strip while keeping the full framework table below. My recommended visualization idea is the **taxa-preserving significance strip**: - the stacked bars keep taxa-specific color as the primary encoding - a second aligned panel shows **LEfSe-like signed effect size** for the selected longitudinal contrast - significant taxa keep their `q` labels, but taxa **keep the same taxon color** instead of switching to a significance palette That way you get the best part of LEfSe-style interpretation without losing the phylogenetic color logic of the bar chart itself. ::: {=html} <style> #relative-abundance-lab { width: min(100%, 1100px); margin: 1.5rem auto 2rem; } .ra-lab-grid { display: grid; gap: 1.25rem; grid-template-columns: minmax(0, 1.5fr) minmax(280px, 0.95fr); } @media (max-width: 900px) { .ra-lab-grid { grid-template-columns: 1fr; } } .ra-card { border: 1px solid #d8dee4; border-radius: 0.9rem; padding: 1rem 1.1rem 1.2rem; background: #fff; box-shadow: 0 10px 24px rgba(15, 23, 42, 0.06); min-width: 0; } .ra-card h3 { margin-top: 0; margin-bottom: 0.5rem; } .ra-card--full-row { grid-column: 1 / -1; } .ra-controls { display: flex; flex-wrap: wrap; align-items: end; gap: 1rem; margin: 1rem 0 1.1rem; } .ra-controls label { display: grid; gap: 0.35rem; font-weight: 600; } .ra-controls select { min-width: 11rem; } .ra-chart-frame { border: 1px solid #e2e8f0; border-radius: 0.8rem; padding: 0.75rem; background: linear-gradient(180deg, #ffffff 0%, #f8fafc 100%); max-width: 34rem; margin: 0 auto; } .ra-chart-frame--wide { max-width: none; } #ra-bar-chart svg, #ra-significance-chart svg { display: block; width: 100%; height: auto; max-width: 100%; margin: 0 auto; } .ra-legend { display: grid; gap: 0.45rem; margin-top: 0.85rem; } .ra-legend-item { display: flex; align-items: center; gap: 0.5rem; font-size: 0.92rem; } .ra-legend-swatch { width: 0.95rem; height: 0.95rem; border-radius: 0.2rem; border: 1px solid rgba(15, 23, 42, 0.12); flex: none; } .ra-caption { color: #475569; font-size: 0.92rem; } .ra-sample-key { margin-top: 0.85rem; color: #334155; font-size: 0.9rem; } .ra-sample-key code { font-size: 0.86rem; } .ra-summary { margin-top: 0.9rem; color: #334155; } .ra-framework-overview { display: grid; gap: 0.6rem; margin-bottom: 0.85rem; } .ra-framework-step { border: 1px solid #d8dee4; border-radius: 0.8rem; background: #f8fafc; padding: 0.7rem 0.8rem; } .ra-framework-step strong { display: block; color: #0f172a; margin-bottom: 0.15rem; } .ra-framework-step span { display: block; color: #475569; font-size: 0.86rem; } .ra-framework-step em { color: #0f766e; font-style: normal; font-weight: 600; } .ra-framework-scroll { overflow-x: auto; } .ra-framework-table { width: 100%; border-collapse: collapse; min-width: 620px; font-size: 0.84rem; } .ra-framework-table th, .ra-framework-table td { padding: 0.55rem 0.45rem; border-bottom: 1px solid #e2e8f0; vertical-align: top; } .ra-framework-table th { color: #334155; font-weight: 700; text-align: left; } .ra-framework-table tbody tr:last-child td { border-bottom: none; } .ra-taxon-cell { display: flex; align-items: center; gap: 0.5rem; color: #0f172a; font-weight: 600; } .ra-framework-cell { display: grid; gap: 0.15rem; } .ra-framework-state { font-weight: 600; color: #475569; } .ra-framework-state--significant { color: #9f1239; } .ra-framework-metric, .ra-framework-enriched { color: #475569; } .ra-call-chip { display: inline-flex; align-items: center; border-radius: 999px; padding: 0.25rem 0.55rem; font-size: 0.78rem; font-weight: 700; border: 1px solid #d8dee4; background: #f8fafc; color: #334155; } .ra-call-chip--intervention { border-color: #bbf7d0; background: #f0fdf4; color: #166534; } .ra-call-chip--watch { border-color: #fde68a; background: #fffbeb; color: #92400e; } .ra-call-chip--caution { border-color: #fecaca; background: #fff1f2; color: #9f1239; } .ra-call-chip--quiet { border-color: #cbd5e1; background: #f8fafc; color: #475569; } .ra-replicate-checks { display: grid; gap: 0.55rem; margin-top: 1rem; } .ra-results-meta { display: grid; gap: 1rem; margin-top: 1rem; } @media (min-width: 900px) { .ra-results-meta { grid-template-columns: minmax(0, 11fr) minmax(0, 9fr); align-items: start; } .ra-results-meta .ra-summary, .ra-results-meta .ra-replicate-checks { margin-top: 0; } } .ra-replicate-chip { border: 1px solid #d8dee4; border-radius: 0.8rem; background: #f8fafc; padding: 0.7rem 0.8rem; } .ra-replicate-chip strong { display: block; color: #0f172a; } .ra-replicate-chip span { color: #475569; font-size: 0.9rem; } .ra-replicate-chip--significant { border-color: #fecaca; background: #fff1f2; } .ra-replicate-chip--significant strong { color: #991b1b; } .ra-replicate-chip--significant span { color: #9f1239; } </style> <div id="relative-abundance-lab"></div> <script> document.addEventListener("DOMContentLoaded", function () { const abundanceLab = document.getElementById("relative-abundance-lab"); abundanceLab.innerHTML = ` <div class="ra-controls"> <label for="ra-level-select"> Taxonomic level <select id="ra-level-select"> <option value="phylum">Phylum</option> <option value="family" selected>Family</option> <option value="genus">Genus</option> <option value="species">Species</option> </select> </label> <label for="ra-contrast-select"> Statistical contrast <select id="ra-contrast-select"> <option value="baseline_group">Control vs experimental at Pre-</option> <option value="post_group" selected>Control vs experimental at time +4</option> <option value="control_time">Control Pre- vs time +4</option> <option value="intervention_time">Experimental Pre- vs time +4</option> </select> </label> </div> <div class="ra-lab-grid"> <section class="ra-card"> <h3>Stacked relative-abundance bars</h3> <p class="ra-caption">The same longitudinal cohort is shown at multiple taxonomic levels. Both groups begin similarly at time -3 and Pre- (time 0), then only the experimental arm shifts strongly by time +4.</p> <div id="ra-bar-chart" class="ra-chart-frame" aria-live="polite"></div> <p class="ra-sample-key"><code>C3[-3]</code> means control-group subject-3 at time -3, and <code>E1[+4]</code> means experimental-group subject-1 at time +4. All group-by-time cells are triplicate.</p> <div id="ra-legend" class="ra-legend" aria-live="polite"></div> </section> <section class="ra-card"> <h3>Taxa Preserving Significance Strip</h3> <p class="ra-caption">This companion panel behaves like a LEfSe-style effect-size plot for the selected longitudinal contrast, while keeping the same taxon colors used in the stacked bars.</p> <div id="ra-significance-strip" class="ra-chart-frame" aria-live="polite"></div> </section> <section class="ra-card ra-card--full-row"> <h3>Pre-intervention vs post-intervention volcano plot</h3> <p class="ra-caption">This synthetic volcano view uses a standard volcano layout with log2 fold change on the x-axis and -log10 p-value on the y-axis while keeping the same taxon colors as the stacked bars.</p> <div id="ra-volcano-plot" class="ra-chart-frame ra-chart-frame--wide" aria-live="polite"></div> </section> <section class="ra-card ra-card--full-row"> <h3>Statistical results across the full framework</h3> <p class="ra-caption">The full statistical readout sits in a single row below the overview cards so you can compare every taxon against the same longitudinal decision framework without stretching the top layout.</p> <div id="ra-significance-chart" class="ra-chart-frame ra-chart-frame--wide" aria-live="polite"></div> <div class="ra-results-meta"> <p id="ra-summary" class="ra-summary"></p> <div id="ra-replicate-checks" class="ra-replicate-checks" aria-live="polite"></div> </div> </section> </div> `; const levelSelect = document.getElementById("ra-level-select"); const contrastSelect = document.getElementById("ra-contrast-select"); const barChart = document.getElementById("ra-bar-chart"); const legend = document.getElementById("ra-legend"); const significanceStrip = document.getElementById("ra-significance-strip"); const volcanoPlot = document.getElementById("ra-volcano-plot"); const significanceChart = document.getElementById("ra-significance-chart"); const summary = document.getElementById("ra-summary"); const replicateChecks = document.getElementById("ra-replicate-checks"); const SIGNIFICANCE_THRESHOLD = 0.05; const MIN_DISPLAY_P_VALUE = 0.0005; const MAX_DISPLAY_P_VALUE = 0.999; const VOLCANO_LOG2FC_PSEUDOCOUNT = 0.01; // Prevents log2FC from blowing up if a mean abundance reaches zero. const VOLCANO_TICK_STEP_FINE = 0.5; const VOLCANO_TICK_STEP_MEDIUM = 1; const VOLCANO_TICK_STEP_COARSE = 2; const VOLCANO_TICK_LIMIT_FINE = 1.5; const VOLCANO_TICK_LIMIT_MEDIUM = 3; const VOLCANO_TICK_DECIMAL_PLACES = 2; const VOLCANO_TICK_ROUNDING_FACTOR = 10 ** VOLCANO_TICK_DECIMAL_PLACES; const subjectIds = [1, 2, 3]; const timepoints = [ { code: "-3", label: "time -3", bandFill: "#dbeafe" }, { code: "0", label: "Pre-", bandFill: "#ede9fe" }, { code: "+4", label: "time +4", bandFill: "#fee2e2" } ]; const groupDefinitions = [ { label: "Control", displayLabel: "control-group", samplePrefix: "C", stroke: "#1976d2" }, { label: "Experimental", displayLabel: "experimental-group", samplePrefix: "E", stroke: "#c62828" } ]; const cohortSamples = groupDefinitions.flatMap((groupDefinition) => timepoints.flatMap((timepoint) => subjectIds.map((subjectId) => ({ sample: `${groupDefinition.samplePrefix}${subjectId}[${timepoint.code}]`, group: groupDefinition.label, groupLabel: groupDefinition.displayLabel, timepoint: timepoint.label, timeCode: timepoint.code, bandFill: timepoint.bandFill, bandStroke: groupDefinition.stroke, description: `${groupDefinition.displayLabel} subject-${subjectId} at ${timepoint.label === "Pre-" ? "Pre- (time 0)" : `time ${timepoint.code}`}` })) ) ); const sampleBlocks = groupDefinitions.flatMap((groupDefinition, groupIndex) => timepoints.map((timepoint, timeIndex) => { const startIndex = groupIndex * timepoints.length * subjectIds.length + timeIndex * subjectIds.length; return { label: timepoint.label, startIndex, endIndex: startIndex + subjectIds.length - 1 }; }) ); const groupBlocks = groupDefinitions.map((groupDefinition, groupIndex) => { const startIndex = groupIndex * timepoints.length * subjectIds.length; return { label: groupDefinition.displayLabel, startIndex, endIndex: startIndex + timepoints.length * subjectIds.length - 1 }; }); const contrasts = { baseline_group: { label: "Control vs experimental at Pre-", leftLabel: "Control Pre-", rightLabel: "Experimental Pre-", description: "Both groups should remain statistically similar at Pre- (time 0) before the intervention takes hold." }, post_group: { label: "Control vs experimental at time +4", leftLabel: "Control +4", rightLabel: "Experimental +4", description: "This is the main between-group contrast after the intervention window." }, control_time: { label: "Control Pre- vs time +4", leftLabel: "Control Pre-", rightLabel: "Control +4", description: "This asks whether the control-group arm drifted over time without the intervention." }, intervention_time: { label: "Experimental Pre- vs time +4", leftLabel: "Experimental Pre-", rightLabel: "Experimental +4", description: "This asks which taxa changed longitudinally inside the experimental arm itself." } }; const cmp = (effect, q, enriched) => ({ effect, q, enriched }); const MIDPOINT_WEIGHT = 0.5; const EARLIER_REFERENCE_WEIGHT = 0.75; const LATER_REFERENCE_WEIGHT = 0.25; const interpolateBetween = (left, right, leftWeight = MIDPOINT_WEIGHT) => Number(((left * leftWeight) + (right * (1 - leftWeight))).toFixed(1)); // Expand two synthetic reference profiles per group/time window into triplicate samples // so the chart can show time -3, Pre- (time 0), and time +4 without rewriting all taxa. // Time -3 and Pre- reuse the same pre-intervention references, but are biased toward the // earlier or later reference profile so they read as distinct triplicate measurements. function expandCompressedLongitudinalScores(scores) { const [ controlPreReferenceA, controlPreReferenceB, controlPlus4ReferenceA, controlPlus4ReferenceB, experimentalPreReferenceA, experimentalPreReferenceB, experimentalPlus4ReferenceA, experimentalPlus4ReferenceB ] = scores; return [ controlPreReferenceA, interpolateBetween(controlPreReferenceA, controlPreReferenceB), controlPreReferenceB, interpolateBetween(controlPreReferenceA, controlPreReferenceB, EARLIER_REFERENCE_WEIGHT), interpolateBetween(controlPreReferenceA, controlPreReferenceB), interpolateBetween(controlPreReferenceA, controlPreReferenceB, LATER_REFERENCE_WEIGHT), controlPlus4ReferenceA, interpolateBetween(controlPlus4ReferenceA, controlPlus4ReferenceB), controlPlus4ReferenceB, experimentalPreReferenceA, interpolateBetween(experimentalPreReferenceA, experimentalPreReferenceB), experimentalPreReferenceB, interpolateBetween(experimentalPreReferenceA, experimentalPreReferenceB, EARLIER_REFERENCE_WEIGHT), interpolateBetween(experimentalPreReferenceA, experimentalPreReferenceB), interpolateBetween(experimentalPreReferenceA, experimentalPreReferenceB, LATER_REFERENCE_WEIGHT), experimentalPlus4ReferenceA, interpolateBetween(experimentalPlus4ReferenceA, experimentalPlus4ReferenceB), experimentalPlus4ReferenceB ]; } const levels = { phylum: { taxa: [ { name: "Bacteroidota", color: "#1976d2", scores: [38, 37, 37, 36, 37, 36, 26, 25], comparisons: { baseline_group: cmp(-0.3, 0.640, "Control baseline"), post_group: cmp(-2.8, 0.011, "Control post"), control_time: cmp(-0.4, 0.510, "Control baseline"), intervention_time: cmp(-2.9, 0.009, "Intervention baseline") } }, { name: "Bacillota", color: "#2e7d32", scores: [33, 34, 33, 34, 33, 34, 40, 41], comparisons: { baseline_group: cmp(0.2, 0.710, "Intervention baseline"), post_group: cmp(1.9, 0.024, "Intervention post"), control_time: cmp(0.2, 0.680, "Control post"), intervention_time: cmp(2.2, 0.016, "Intervention post") } }, { name: "Actinomycetota", color: "#7e57c2", scores: [11, 10, 10, 10, 10, 10, 17, 17], comparisons: { baseline_group: cmp(-0.1, 0.860, "Control baseline"), post_group: cmp(2.9, 0.006, "Intervention post"), control_time: cmp(-0.1, 0.840, "Control baseline"), intervention_time: cmp(3.0, 0.004, "Intervention post") } }, { name: "Pseudomonadota", color: "#c62828", scores: [9, 9, 10, 9, 9, 9, 7, 7], comparisons: { baseline_group: cmp(0.0, 0.980, "Intervention baseline"), post_group: cmp(-1.0, 0.180, "Control post"), control_time: cmp(0.3, 0.620, "Control post"), intervention_time: cmp(-1.1, 0.150, "Intervention baseline") } }, { name: "Verrucomicrobiota", color: "#00897b", scores: [6, 6, 6, 7, 6, 7, 9, 9], comparisons: { baseline_group: cmp(0.1, 0.820, "Intervention baseline"), post_group: cmp(1.8, 0.031, "Intervention post"), control_time: cmp(0.3, 0.590, "Control post"), intervention_time: cmp(2.1, 0.018, "Intervention post") } }, { name: "Fusobacteriota", color: "#ef6c00", scores: [3, 4, 4, 4, 5, 4, 4, 4], comparisons: { baseline_group: cmp(0.2, 0.670, "Intervention baseline"), post_group: cmp(0.0, 0.990, "Intervention post"), control_time: cmp(0.2, 0.730, "Control post"), intervention_time: cmp(-0.2, 0.700, "Intervention baseline") } } ], replicateChecks: [ { label: "Control-group time -3 triplicates", q: 0.870 }, { label: "Control-group Pre- triplicates", q: 0.830 }, { label: "Control-group time +4 triplicates", q: 0.770 }, { label: "Experimental-group time -3 triplicates", q: 0.850 }, { label: "Experimental-group Pre- triplicates", q: 0.810 }, { label: "Experimental-group time +4 triplicates", q: 0.680 } ] }, family: { taxa: [ { name: "Bacteroidaceae", color: "#1976d2", scores: [25, 24, 24, 23, 24, 23, 14, 13], comparisons: { baseline_group: cmp(-0.4, 0.620, "Control baseline"), post_group: cmp(-3.1, 0.006, "Control post"), control_time: cmp(-0.3, 0.580, "Control baseline"), intervention_time: cmp(-3.0, 0.005, "Intervention baseline") } }, { name: "Prevotellaceae", color: "#64b5f6", scores: [8, 8, 8, 9, 8, 9, 15, 16], comparisons: { baseline_group: cmp(0.2, 0.740, "Intervention baseline"), post_group: cmp(2.8, 0.010, "Intervention post"), control_time: cmp(0.3, 0.600, "Control post"), intervention_time: cmp(2.7, 0.009, "Intervention post") } }, { name: "Lachnospiraceae", color: "#2e7d32", scores: [20, 21, 20, 21, 20, 20, 23, 24], comparisons: { baseline_group: cmp(-0.2, 0.700, "Control baseline"), post_group: cmp(1.2, 0.082, "Intervention post"), control_time: cmp(0.1, 0.860, "Control post"), intervention_time: cmp(1.4, 0.070, "Intervention post") } }, { name: "Ruminococcaceae", color: "#81c784", scores: [12, 12, 12, 13, 12, 12, 17, 17], comparisons: { baseline_group: cmp(0.0, 0.930, "Intervention baseline"), post_group: cmp(2.1, 0.019, "Intervention post"), control_time: cmp(0.2, 0.720, "Control post"), intervention_time: cmp(2.2, 0.015, "Intervention post") } }, { name: "Bifidobacteriaceae", color: "#7e57c2", scores: [6, 6, 6, 7, 6, 6, 12, 12], comparisons: { baseline_group: cmp(0.0, 0.980, "Intervention baseline"), post_group: cmp(3.0, 0.004, "Intervention post"), control_time: cmp(0.2, 0.690, "Control post"), intervention_time: cmp(3.1, 0.003, "Intervention post") } }, { name: "Akkermansiaceae", color: "#00897b", scores: [4, 4, 4, 5, 4, 4, 8, 8], comparisons: { baseline_group: cmp(0.0, 0.990, "Intervention baseline"), post_group: cmp(2.2, 0.016, "Intervention post"), control_time: cmp(0.3, 0.620, "Control post"), intervention_time: cmp(2.3, 0.014, "Intervention post") } }, { name: "Other", color: "#d9d9d9", scores: [25, 25, 26, 22, 26, 26, 11, 10], comparisons: { baseline_group: cmp(0.1, 0.840, "Intervention baseline"), post_group: cmp(-0.5, 0.440, "Control post"), control_time: cmp(-0.4, 0.500, "Control baseline"), intervention_time: cmp(-0.6, 0.390, "Intervention baseline") } } ], replicateChecks: [ { label: "Control-group time -3 triplicates", q: 0.840 }, { label: "Control-group Pre- triplicates", q: 0.790 }, { label: "Control-group time +4 triplicates", q: 0.720 }, { label: "Experimental-group time -3 triplicates", q: 0.860 }, { label: "Experimental-group Pre- triplicates", q: 0.840 }, { label: "Experimental-group time +4 triplicates", q: 0.610 } ] }, genus: { taxa: [ { name: "Bacteroides", color: "#1976d2", scores: [23, 22, 22, 21, 22, 21, 12, 11], comparisons: { baseline_group: cmp(-0.3, 0.650, "Control baseline"), post_group: cmp(-3.0, 0.005, "Control post"), control_time: cmp(-0.2, 0.670, "Control baseline"), intervention_time: cmp(-2.9, 0.006, "Intervention baseline") } }, { name: "Prevotella", color: "#64b5f6", scores: [6, 6, 6, 7, 6, 7, 13, 14], comparisons: { baseline_group: cmp(0.2, 0.760, "Intervention baseline"), post_group: cmp(2.9, 0.008, "Intervention post"), control_time: cmp(0.3, 0.590, "Control post"), intervention_time: cmp(2.8, 0.008, "Intervention post") } }, { name: "Faecalibacterium", color: "#2e7d32", scores: [15, 15, 15, 16, 15, 15, 18, 19], comparisons: { baseline_group: cmp(0.0, 0.940, "Intervention baseline"), post_group: cmp(1.4, 0.066, "Intervention post"), control_time: cmp(0.2, 0.740, "Control post"), intervention_time: cmp(1.5, 0.059, "Intervention post") } }, { name: "Roseburia", color: "#81c784", scores: [9, 9, 9, 10, 9, 9, 13, 14], comparisons: { baseline_group: cmp(0.0, 0.930, "Intervention baseline"), post_group: cmp(2.1, 0.022, "Intervention post"), control_time: cmp(0.2, 0.700, "Control post"), intervention_time: cmp(2.3, 0.018, "Intervention post") } }, { name: "Bifidobacterium", color: "#7e57c2", scores: [5, 5, 5, 6, 5, 5, 11, 12], comparisons: { baseline_group: cmp(0.0, 0.970, "Intervention baseline"), post_group: cmp(3.0, 0.004, "Intervention post"), control_time: cmp(0.2, 0.690, "Control post"), intervention_time: cmp(3.0, 0.004, "Intervention post") } }, { name: "Akkermansia", color: "#00897b", scores: [4, 4, 4, 4, 4, 4, 8, 8], comparisons: { baseline_group: cmp(0.0, 0.990, "Intervention baseline"), post_group: cmp(2.0, 0.027, "Intervention post"), control_time: cmp(0.0, 0.990, "Control baseline"), intervention_time: cmp(2.2, 0.020, "Intervention post") } }, { name: "Other", color: "#d9d9d9", scores: [38, 39, 39, 36, 39, 39, 25, 22], comparisons: { baseline_group: cmp(0.1, 0.840, "Intervention baseline"), post_group: cmp(-0.4, 0.490, "Control post"), control_time: cmp(-0.4, 0.510, "Control baseline"), intervention_time: cmp(-0.5, 0.430, "Intervention baseline") } } ], replicateChecks: [ { label: "Control-group time -3 triplicates", q: 0.850 }, { label: "Control-group Pre- triplicates", q: 0.810 }, { label: "Control-group time +4 triplicates", q: 0.700 }, { label: "Experimental-group time -3 triplicates", q: 0.840 }, { label: "Experimental-group Pre- triplicates", q: 0.820 }, { label: "Experimental-group time +4 triplicates", q: 0.600 } ] }, species: { taxa: [ { name: "Bacteroides vulgatus", color: "#1976d2", scores: [11, 11, 10, 10, 10, 10, 5, 5], comparisons: { baseline_group: cmp(0.0, 0.990, "Control baseline"), post_group: cmp(-2.8, 0.011, "Control post"), control_time: cmp(-0.2, 0.730, "Control baseline"), intervention_time: cmp(-2.6, 0.013, "Intervention baseline") } }, { name: "Bacteroides uniformis", color: "#42a5f5", scores: [8, 8, 8, 8, 8, 8, 4, 4], comparisons: { baseline_group: cmp(0.0, 0.990, "Control baseline"), post_group: cmp(-2.2, 0.025, "Control post"), control_time: cmp(0.0, 0.990, "Control baseline"), intervention_time: cmp(-2.1, 0.027, "Intervention baseline") } }, { name: "Prevotella copri", color: "#64b5f6", scores: [4, 4, 4, 4, 4, 4, 10, 11], comparisons: { baseline_group: cmp(0.0, 0.990, "Intervention baseline"), post_group: cmp(3.2, 0.003, "Intervention post"), control_time: cmp(0.0, 0.990, "Control baseline"), intervention_time: cmp(3.1, 0.004, "Intervention post") } }, { name: "Faecalibacterium prausnitzii", color: "#2e7d32", scores: [11, 11, 11, 12, 11, 11, 14, 15], comparisons: { baseline_group: cmp(0.0, 0.920, "Intervention baseline"), post_group: cmp(1.7, 0.037, "Intervention post"), control_time: cmp(0.3, 0.610, "Control post"), intervention_time: cmp(1.8, 0.033, "Intervention post") } }, { name: "Bifidobacterium adolescentis", color: "#7e57c2", scores: [3, 3, 3, 3, 3, 3, 8, 9], comparisons: { baseline_group: cmp(0.0, 0.990, "Intervention baseline"), post_group: cmp(3.1, 0.004, "Intervention post"), control_time: cmp(0.0, 0.990, "Control baseline"), intervention_time: cmp(3.2, 0.003, "Intervention post") } }, { name: "Akkermansia muciniphila", color: "#00897b", scores: [3, 4, 3, 4, 3, 3, 7, 7], comparisons: { baseline_group: cmp(-0.2, 0.710, "Control baseline"), post_group: cmp(2.4, 0.019, "Intervention post"), control_time: cmp(0.2, 0.690, "Control post"), intervention_time: cmp(2.5, 0.017, "Intervention post") } }, { name: "Other", color: "#d9d9d9", scores: [60, 59, 61, 59, 61, 61, 52, 49], comparisons: { baseline_group: cmp(0.0, 0.930, "Intervention baseline"), post_group: cmp(-0.1, 0.790, "Control post"), control_time: cmp(0.0, 0.910, "Control baseline"), intervention_time: cmp(-0.2, 0.710, "Intervention baseline") } } ], replicateChecks: [ { label: "Control-group time -3 triplicates", q: 0.820 }, { label: "Control-group Pre- triplicates", q: 0.760 }, { label: "Control-group time +4 triplicates", q: 0.680 }, { label: "Experimental-group time -3 triplicates", q: 0.810 }, { label: "Experimental-group Pre- triplicates", q: 0.780 }, { label: "Experimental-group time +4 triplicates", q: 0.570 } ] } }; Object.values(levels).forEach((level) => { level.taxa.forEach((taxon) => { taxon.scores = expandCompressedLongitudinalScores(taxon.scores); }); }); function xForSample(index, chartLeft, barWidth, gap) { return chartLeft + index * (barWidth + gap); } function computeReplicateCheckStatus(check) { return check.q < SIGNIFICANCE_THRESHOLD ? { chipModifierClass: "ra-replicate-chip--significant", label: "significant" } : { chipModifierClass: "", label: "non-significant" }; } function formatEffect(effect) { return `${effect > 0 ? "+" : ""}${effect.toFixed(1)}`; } function normalizeComparisonLabel(label) { const comparisonLabelMap = { "Control baseline": "Control Pre-", "Intervention baseline": "Experimental Pre-", "Control post": "Control +4", "Intervention post": "Experimental +4" }; return comparisonLabelMap[label] || label; } function formatEnrichedSource(label) { return `${normalizeComparisonLabel(label)} enriched`; } function clamp(value, min, max) { return Math.min(Math.max(value, min), max); } // The page only stores synthetic q-values for each contrast, so derive an approximate // bounded display-only p-value from them to support the requested volcano-plot geometry. // Multiplying by 0.5 simply creates a synthetic display value smaller than q for the y-axis, // while the upper bound prevents a collapsed zero-height cloud and the lower bound prevents // runaway -log10 spikes from tiny synthetic values. function deriveSyntheticPValue(q) { return clamp(q * 0.5, MIN_DISPLAY_P_VALUE, MAX_DISPLAY_P_VALUE); } function formatVolcanoTickLabel(value) { if (value === 0) { return "0"; } return value.toFixed(2).replace(/(\.\d)0$/, "$1"); } function formatSignedLdaScore(value) { return `${value >= 0 ? "+" : ""}${value.toFixed(1)}`; } function escapeSvgText(value) { return String(value) .replace(/&/g, "&") .replace(/</g, "<") .replace(/>/g, ">") .replace(/"/g, """) .replace(/'/g, "'"); } function placeVolcanoLabels(labels, minY, maxY, separation) { if (labels.length === 0) { return []; } const placed = labels .slice() .sort((left, right) => left.preferredY - right.preferredY) .map((label) => ({ ...label })); placed[0].resolvedY = clamp(placed[0].preferredY, minY, maxY); for (let index = 1; index < placed.length; index += 1) { const minimumY = placed[index - 1].resolvedY + separation; placed[index].resolvedY = clamp(placed[index].preferredY, minimumY, maxY); } for (let index = placed.length - 2; index >= 0; index -= 1) { const maximumY = placed[index + 1].resolvedY - separation; placed[index].resolvedY = clamp(placed[index].resolvedY, minY, maximumY); } return placed; } function mean(values) { return values.reduce((total, value) => total + value, 0) / values.length; } function computeLog2FoldChange(preValues, postValues) { // The interactive cohort always has triplicate Pre- and time +4 values, but return a // neutral fold change if a future teaching dataset omits one side of the comparison. if (preValues.length === 0 || postValues.length === 0) { return 0; } const preMean = mean(preValues); const postMean = mean(postValues); return Math.log2( (postMean + VOLCANO_LOG2FC_PSEUDOCOUNT) / (preMean + VOLCANO_LOG2FC_PSEUDOCOUNT) ); } function computeVolcanoXAxisLimit(values) { const maxAbsoluteValue = Math.max(...values.map((value) => Math.abs(value))); // Round the symmetric x-axis limit to simple half-step labels while keeping a minimum span of 1. return Math.max(1, Math.ceil(maxAbsoluteValue * 2) / 2); } function buildVolcanoXTicks(limit) { const tickStep = limit <= VOLCANO_TICK_LIMIT_FINE ? VOLCANO_TICK_STEP_FINE : limit <= VOLCANO_TICK_LIMIT_MEDIUM ? VOLCANO_TICK_STEP_MEDIUM : VOLCANO_TICK_STEP_COARSE; const tickCount = Math.round((limit * 2) / tickStep); return Array.from({ length: tickCount + 1 }, (_, index) => Math.round((-limit + (index * tickStep)) * VOLCANO_TICK_ROUNDING_FACTOR) / VOLCANO_TICK_ROUNDING_FACTOR ); } function averageRanks(values) { const ranked = values .map((value, index) => ({ value, index })) .sort((left, right) => left.value - right.value); const ranks = new Array(values.length); let start = 0; while (start < ranked.length) { let end = start + 1; while (end < ranked.length && ranked[end].value === ranked[start].value) { end += 1; } const averageRank = (start + 1 + end) / 2; for (let rankIndex = start; rankIndex < end; rankIndex += 1) { ranks[ranked[rankIndex].index] = averageRank; } start = end; } return ranks; } function sampleIndicesFor(groupLabel, timeCode) { return cohortSamples.flatMap((sampleMeta, sampleIndex) => sampleMeta.group === groupLabel && sampleMeta.timeCode === timeCode ? [sampleIndex] : [] ); } function valuesForTaxonSamples(taxon, groupLabel, timeCode) { return sampleIndicesFor(groupLabel, timeCode).map((sampleIndex) => taxon.scores[sampleIndex]); } function errorFunction(value) { const sign = value < 0 ? -1 : 1; const absoluteValue = Math.abs(value); const approximationFactor = 1 / (1 + 0.3275911 * absoluteValue); const polynomial = (((((1.061405429 * approximationFactor) - 1.453152027) * approximationFactor) + 1.421413741) * approximationFactor - 0.284496736) * approximationFactor + 0.254829592; return sign * (1 - polynomial * approximationFactor * Math.exp(-(absoluteValue ** 2))); } function chiSquareSurvivalDf1(statistic) { return clamp(1 - errorFunction(Math.sqrt(Math.max(statistic, 0) / 2)), MIN_DISPLAY_P_VALUE, MAX_DISPLAY_P_VALUE); } function computeKruskalWallisPValue(groups) { const flattened = groups.flatMap((groupValues, groupIndex) => groupValues.map((value) => ({ value, groupIndex })) ); const ranks = averageRanks(flattened.map((entry) => entry.value)); const totalCount = flattened.length; const tieCounts = flattened.reduce((counts, entry) => { counts[entry.value] = (counts[entry.value] || 0) + 1; return counts; }, {}); const tieCorrectionDenominator = totalCount ** 3 - totalCount; const tieCorrection = tieCorrectionDenominator > 0 ? 1 - (Object.values(tieCounts).reduce((total, count) => total + (count ** 3 - count), 0) / tieCorrectionDenominator) : 1; const rankSums = groups.map(() => 0); flattened.forEach((entry, entryIndex) => { rankSums[entry.groupIndex] += ranks[entryIndex]; }); const statisticBase = groups.reduce((total, groupValues, groupIndex) => total + ((rankSums[groupIndex] ** 2) / groupValues.length), 0 ); const statistic = ((12 / (totalCount * (totalCount + 1))) * statisticBase) - (3 * (totalCount + 1)); return chiSquareSurvivalDf1(tieCorrection > Number.EPSILON ? statistic / tieCorrection : statistic); } function shouldLabelVolcanoPoint(point) { return point.pValue < SIGNIFICANCE_THRESHOLD && point.taxon.name !== "Other"; } function classifyTaxon(taxon) { const preMatch = taxon.comparisons.baseline_group.q >= SIGNIFICANCE_THRESHOLD; const controlStable = taxon.comparisons.control_time.q >= SIGNIFICANCE_THRESHOLD; const experimentalResponse = taxon.comparisons.intervention_time.q < SIGNIFICANCE_THRESHOLD; const postSeparation = taxon.comparisons.post_group.q < SIGNIFICANCE_THRESHOLD; if (preMatch && controlStable && experimentalResponse && postSeparation) { return { label: "Intervention-linked", detail: `${formatEnrichedSource(taxon.comparisons.intervention_time.enriched)} by time +4`, chipClass: "ra-call-chip ra-call-chip--intervention" }; } if (!preMatch) { return { label: "Pre-existing difference", detail: "Groups already differ at Pre-", chipClass: "ra-call-chip ra-call-chip--caution" }; } if (!controlStable && experimentalResponse) { return { label: "Temporal confounding", detail: "Time drift complicates the intervention readout", chipClass: "ra-call-chip ra-call-chip--caution" }; } if (experimentalResponse || postSeparation) { return { label: "Partial longitudinal signal", detail: "Only part of the full framework is significant", chipClass: "ra-call-chip ra-call-chip--watch" }; } return { label: "No integrated signal", detail: "No contrast passes the full framework", chipClass: "ra-call-chip ra-call-chip--quiet" }; } function renderFrameworkCell(comparison) { const significant = comparison.q < SIGNIFICANCE_THRESHOLD; return ` <div class="ra-framework-cell"> <span class="ra-framework-state ${significant ? "ra-framework-state--significant" : ""}">${significant ? "significant" : "non-significant"}</span> <span class="ra-framework-metric">${formatEffect(comparison.effect)} · q=${comparison.q.toFixed(3)}</span> <span class="ra-framework-enriched">${normalizeComparisonLabel(comparison.enriched)}</span> </div> `; } function renderLevel(levelName, contrastName) { const level = levels[levelName]; const contrastKey = contrasts[contrastName] ? contrastName : "post_group"; const contrast = contrasts[contrastKey]; const maxEffect = 3.6; const stripBaseHeight = 48; const stripRowHeight = 28; const svgWidth = 560; const svgHeight = 268; const chartLeft = 42; const chartTop = 22; const chartBottom = 188; const barWidth = 20; const gap = 6; const chartWidth = cohortSamples.length * (barWidth + gap) - gap; const bars = cohortSamples.map((sampleMeta, sampleIndex) => { let currentY = chartBottom; const segments = level.taxa.map((taxon) => { const value = taxon.scores[sampleIndex]; const height = (value / 100) * (chartBottom - chartTop); currentY -= height; return `<rect x="${xForSample(sampleIndex, chartLeft, barWidth, gap)}" y="${currentY}" width="${barWidth}" height="${height}" fill="${taxon.color}" rx="3" ry="3"></rect>`; }).join(""); return ` <g> <title>${sampleMeta.description}</title> <rect x="${xForSample(sampleIndex, chartLeft, barWidth, gap)}" y="${chartBottom + 8}" width="${barWidth}" height="18" fill="${sampleMeta.bandFill}" stroke="${sampleMeta.bandStroke}" stroke-width="1" rx="4" ry="4"></rect> ${segments} <text x="${xForSample(sampleIndex, chartLeft, barWidth, gap) + barWidth / 2}" y="${chartBottom + 20}" text-anchor="middle" font-size="7.2" fill="#0f172a">${sampleMeta.sample}</text> </g> `; }).join(""); const blockLabels = sampleBlocks.map((block, blockIndex) => { const xStart = xForSample(block.startIndex, chartLeft, barWidth, gap); const xEnd = xForSample(block.endIndex, chartLeft, barWidth, gap) + barWidth; const xCenter = (xStart + xEnd) / 2; const separator = blockIndex < sampleBlocks.length - 1 ? `<line x1="${xEnd + gap / 2}" y1="${chartTop}" x2="${xEnd + gap / 2}" y2="${chartBottom + 38}" stroke="#cbd5e1" stroke-dasharray="3 3"></line>` : ""; return ` ${separator} <text x="${xCenter}" y="${chartBottom + 38}" text-anchor="middle" font-size="8.4" fill="#475569">${block.label}</text> `; }).join(""); const groupLabels = groupBlocks.map((block) => { const xStart = xForSample(block.startIndex, chartLeft, barWidth, gap); const xEnd = xForSample(block.endIndex, chartLeft, barWidth, gap) + barWidth; return ` <line x1="${xStart}" y1="${chartBottom + 50}" x2="${xEnd}" y2="${chartBottom + 50}" stroke="#94a3b8" stroke-width="1.4"></line> <text x="${(xStart + xEnd) / 2}" y="${chartBottom + 64}" text-anchor="middle" font-size="9.5" fill="#334155">${block.label}</text> `; }).join(""); const yTicks = [0, 25, 50, 75, 100].map((tick) => { const y = chartBottom - ((tick / 100) * (chartBottom - chartTop)); return ` <line x1="${chartLeft - 4}" y1="${y}" x2="${chartLeft + chartWidth}" y2="${y}" stroke="#e2e8f0" stroke-width="1"></line> <text x="${chartLeft - 8}" y="${y + 3}" text-anchor="end" font-size="8" fill="#475569">${tick}</text> `; }).join(""); barChart.innerHTML = ` <svg viewBox="0 0 ${svgWidth} ${svgHeight}" width="100%" role="img" aria-label="Interactive stacked relative abundance bar chart"> <text x="0" y="12" font-size="9" fill="#334155">${levelName.charAt(0).toUpperCase() + levelName.slice(1)}-level relative abundance across longitudinal control-group and experimental-group cohorts</text> ${yTicks} <line x1="${chartLeft}" y1="${chartTop}" x2="${chartLeft}" y2="${chartBottom}" stroke="#94a3b8" stroke-width="1"></line> <line x1="${chartLeft}" y1="${chartBottom}" x2="${chartLeft + chartWidth}" y2="${chartBottom}" stroke="#94a3b8" stroke-width="1"></line> <text x="12" y="18" transform="rotate(-90 12 18)" font-size="8" fill="#334155">Relative abundance (%)</text> ${bars} ${blockLabels} ${groupLabels} </svg> `; legend.innerHTML = level.taxa.map((taxon) => { const selectedComparison = taxon.comparisons[contrastKey] || taxon.comparisons.post_group; const significant = selectedComparison.q < SIGNIFICANCE_THRESHOLD ? " • significant in selected contrast" : ""; return ` <div class="ra-legend-item"> <span class="ra-legend-swatch" style="background:${taxon.color}"></span> <span>${taxon.name}${significant}</span> </div> `; }).join(""); const stripHeight = stripBaseHeight + level.taxa.length * stripRowHeight; const zeroX = 150; const stripScale = 110 / maxEffect; const significanceRows = level.taxa.map((taxon, index) => { const comparison = taxon.comparisons[contrastKey] || taxon.comparisons.post_group; const y = 34 + index * stripRowHeight; const signedEffect = comparison.effect; const effectWidth = Math.abs(signedEffect) * stripScale; const x = signedEffect >= 0 ? zeroX : zeroX - effectWidth; const pointX = signedEffect >= 0 ? zeroX + effectWidth : zeroX - effectWidth; const qLabel = comparison.q < SIGNIFICANCE_THRESHOLD ? `q=${comparison.q.toFixed(3)}` : "ns"; const point = comparison.q < SIGNIFICANCE_THRESHOLD ? `<circle cx="${pointX}" cy="${y}" r="4.5" fill="#0f172a"></circle>` : `<circle cx="${pointX}" cy="${y}" r="4.5" fill="white" stroke="#64748b" stroke-width="1.2"></circle>`; return ` <text x="0" y="${y + 3}" font-size="9" fill="#0f172a">${taxon.name}</text> <rect x="${x}" y="${y - 8}" width="${effectWidth}" height="16" fill="${taxon.color}" opacity="${comparison.q < SIGNIFICANCE_THRESHOLD ? 0.95 : 0.45}" rx="4" ry="4"></rect> ${point} <text x="${signedEffect >= 0 ? 270 : 34}" y="${y + 3}" font-size="8" fill="#475569">${qLabel}</text> `; }).join(""); significanceStrip.innerHTML = ` <svg viewBox="0 0 320 ${stripHeight}" width="100%" role="img" aria-label="Taxa preserving significance strip"> <text x="0" y="12" font-size="9" fill="#334155">${contrast.label}</text> <line x1="${zeroX}" y1="20" x2="${zeroX}" y2="${stripHeight - 8}" stroke="#94a3b8" stroke-width="1"></line> <text x="${zeroX - 82}" y="24" font-size="8" fill="#475569">${contrast.leftLabel}</text> <text x="${zeroX + 8}" y="24" font-size="8" fill="#475569">${contrast.rightLabel}</text> ${significanceRows} </svg> `; const volcanoComparisonKey = "intervention_time"; const volcanoSvgWidth = 720; const volcanoSvgHeight = 320; const volcanoLeft = 72; const volcanoRight = 30; const volcanoTop = 32; const volcanoBottom = 258; const volcanoWidth = volcanoSvgWidth - volcanoLeft - volcanoRight; const volcanoHeight = volcanoBottom - volcanoTop; const volcanoLabelSeparation = 14; const volcanoLabelOffsetUp = 10; const volcanoLabelOffsetDown = -18; const volcanoLabelBottomMargin = 12; const volcanoPoints = level.taxa.map((taxon) => { const comparison = taxon.comparisons[volcanoComparisonKey] || taxon.comparisons.post_group; const experimentalPreValues = valuesForTaxonSamples(taxon, "Experimental", "0"); const experimentalPostValues = valuesForTaxonSamples(taxon, "Experimental", "+4"); const pValue = computeKruskalWallisPValue([experimentalPreValues, experimentalPostValues]); const log2FoldChange = computeLog2FoldChange(experimentalPreValues, experimentalPostValues); const negativeLogP = -Math.log10(pValue); return { taxon, comparison, pValue, log2FoldChange, negativeLogP }; }); // Keep at least ~-log10(0.025) of headroom so the plot has readable vertical spread even // for levels where none of the synthetic p-values are especially small. const maxNegativeLogP = Math.max(1.6, ...volcanoPoints.map((point) => point.negativeLogP)); const volcanoXLimit = computeVolcanoXAxisLimit(volcanoPoints.map((point) => point.log2FoldChange)); const xForLog2FoldChange = (value) => volcanoLeft + ((value + volcanoXLimit) / (volcanoXLimit * 2)) * volcanoWidth; const yForLogP = (value) => volcanoBottom - (value / maxNegativeLogP) * volcanoHeight; const volcanoXTicks = buildVolcanoXTicks(volcanoXLimit).map((tick) => { const x = xForLog2FoldChange(tick); return ` <line x1="${x}" y1="${volcanoBottom}" x2="${x}" y2="${volcanoBottom + 6}" stroke="#94a3b8" stroke-width="1"></line> <text x="${x}" y="${volcanoBottom + 18}" text-anchor="middle" font-size="8" fill="#475569">${formatVolcanoTickLabel(tick)}</text> `; }).join(""); const volcanoYTicks = Array.from({ length: Math.ceil(maxNegativeLogP) + 1 }, (_, index) => index).map((tick) => { const y = yForLogP(tick); return ` <line x1="${volcanoLeft}" y1="${y}" x2="${volcanoLeft + volcanoWidth}" y2="${y}" stroke="#e2e8f0" stroke-width="1"></line> <text x="${volcanoLeft - 10}" y="${y + 3}" text-anchor="end" font-size="8" fill="#475569">${tick}</text> `; }).join(""); const significanceThresholdY = yForLogP(-Math.log10(0.05)); const initialVolcanoLabels = volcanoPoints .filter((point) => shouldLabelVolcanoPoint(point)) .map((point, index) => { const x = xForLog2FoldChange(point.log2FoldChange); const side = point.log2FoldChange >= 0 ? "positive" : "negative"; // Alternate the initial label offset above/below nearby points before enforcing // a minimum separation on each side of the plot to reduce visible overlap. const preferredY = yForLogP(point.negativeLogP) - (index % 2 === 0 ? volcanoLabelOffsetUp : volcanoLabelOffsetDown); return { point, x, side, preferredY }; }); const volcanoLabelBounds = { minY: volcanoTop, maxY: volcanoBottom - volcanoLabelBottomMargin }; const resolvedVolcanoLabels = ["negative", "positive"].flatMap((side) => placeVolcanoLabels( initialVolcanoLabels.filter((label) => label.side === side), volcanoLabelBounds.minY, volcanoLabelBounds.maxY, volcanoLabelSeparation ) ); const volcanoLabels = resolvedVolcanoLabels.map((label) => { const labelText = `${label.point.taxon.name} · log2FC ${formatVolcanoTickLabel(label.point.log2FoldChange)}`; return `<text x="${label.x + (label.point.log2FoldChange >= 0 ? 8 : -8)}" y="${label.resolvedY}" text-anchor="${label.point.log2FoldChange >= 0 ? "start" : "end"}" font-size="8" fill="#334155">${escapeSvgText(labelText)}</text>`; }).join(""); const volcanoPointMarks = volcanoPoints.map((point) => { const x = xForLog2FoldChange(point.log2FoldChange); const y = yForLogP(point.negativeLogP); const significant = point.pValue < SIGNIFICANCE_THRESHOLD; const pointRadius = 4.8; return ` <g> <title>${escapeSvgText(point.taxon.name)}: log2 fold change ${point.log2FoldChange.toFixed(2)}, p=${point.pValue.toFixed(4)}, q=${point.comparison.q.toFixed(4)}</title> <circle cx="${x}" cy="${y}" r="${pointRadius.toFixed(1)}" fill="${point.taxon.color}" fill-opacity="${significant ? 0.95 : 0.7}" stroke="${significant ? "#0f172a" : "#94a3b8"}" stroke-width="1" ></circle> </g> `; }).join(""); volcanoPlot.innerHTML = ` <svg viewBox="0 0 ${volcanoSvgWidth} ${volcanoSvgHeight}" width="100%" role="img" aria-label="Pre-intervention versus post-intervention volcano plot with log2 fold change on the x-axis and minus log10 p-value on the y-axis"> <text x="0" y="12" font-size="9" fill="#334155">${levelName.charAt(0).toUpperCase() + levelName.slice(1)}-level experimental pre-intervention vs post-intervention volcano plot</text> ${volcanoYTicks} <line x1="${volcanoLeft}" y1="${volcanoTop}" x2="${volcanoLeft}" y2="${volcanoBottom}" stroke="#94a3b8" stroke-width="1"></line> <line x1="${volcanoLeft}" y1="${volcanoBottom}" x2="${volcanoLeft + volcanoWidth}" y2="${volcanoBottom}" stroke="#94a3b8" stroke-width="1"></line> <line x1="${volcanoLeft}" y1="${significanceThresholdY}" x2="${volcanoLeft + volcanoWidth}" y2="${significanceThresholdY}" stroke="#c2410c" stroke-dasharray="4 4"></line> <line x1="${xForLog2FoldChange(0)}" y1="${volcanoTop}" x2="${xForLog2FoldChange(0)}" y2="${volcanoBottom}" stroke="#cbd5e1" stroke-dasharray="3 3"></line> <text x="${volcanoLeft - 52}" y="${volcanoTop + 20}" transform="rotate(-90 ${volcanoLeft - 52} ${volcanoTop + 20})" font-size="8" fill="#334155">-log10 p value</text> <text x="${volcanoLeft + volcanoWidth / 2}" y="${volcanoBottom + 34}" text-anchor="middle" font-size="8" fill="#334155">log2 fold change</text> <text x="${xForLog2FoldChange(-volcanoXLimit)}" y="${volcanoBottom + 48}" text-anchor="start" font-size="8" fill="#475569">Pre-intervention enriched</text> <text x="${xForLog2FoldChange(volcanoXLimit)}" y="${volcanoBottom + 48}" text-anchor="end" font-size="8" fill="#475569">Post-intervention enriched</text> <text x="${volcanoLeft + volcanoWidth - 4}" y="${significanceThresholdY - 6}" text-anchor="end" font-size="8" fill="#c2410c">p = ${SIGNIFICANCE_THRESHOLD.toFixed(2)}</text> ${volcanoXTicks} ${volcanoPointMarks} ${volcanoLabels} </svg> `; significanceChart.innerHTML = ` <div class="ra-framework-scroll"> <table class="ra-framework-table"> <thead> <tr> <th scope="col">Taxon</th> <th scope="col">Pre- group match</th> <th scope="col">Control stability</th> <th scope="col">Experimental response</th> <th scope="col">Time +4 separation</th> <th scope="col">Final call</th> </tr> </thead> <tbody> ${level.taxa.map((taxon) => { const frameworkCall = classifyTaxon(taxon); return ` <tr> <td> <div class="ra-taxon-cell"> <span class="ra-legend-swatch" style="background:${taxon.color}"></span> <span>${taxon.name}</span> </div> </td> <td>${renderFrameworkCell(taxon.comparisons.baseline_group)}</td> <td>${renderFrameworkCell(taxon.comparisons.control_time)}</td> <td>${renderFrameworkCell(taxon.comparisons.intervention_time)}</td> <td>${renderFrameworkCell(taxon.comparisons.post_group)}</td> <td> <span class="${frameworkCall.chipClass}">${frameworkCall.label}</span> <div class="ra-framework-enriched">${frameworkCall.detail}</div> </td> </tr> `; }).join("")} </tbody> </table> </div> `; const significantTaxa = level.taxa .filter((taxon) => classifyTaxon(taxon).label === "Intervention-linked" && taxon.name !== "Other"); const significantReplicateChecks = level.replicateChecks.filter((check) => check.q < SIGNIFICANCE_THRESHOLD); const replicateSummaryText = significantReplicateChecks.length ? "One or more same-group / same-time-point replicate checks are significant in this synthetic example, so interpret the contrast-specific signal more cautiously." : "All same-group / same-time-point replicate checks remain non-significant."; replicateChecks.innerHTML = level.replicateChecks.map((check) => { const status = computeReplicateCheckStatus(check); const chipClass = `ra-replicate-chip ${status.chipModifierClass}`.trim(); return ` <div class="${chipClass}"> <strong>${check.label}</strong> <span>Same-group / same-time-point pairwise check: q=${check.q.toFixed(3)} · ${status.label}</span> </div> `; }).join(""); if (significantTaxa.length === 0) { summary.textContent = `This final framework combines Pre- group matching, control stability, experimental response, and time +4 separation. At the ${levelName} level, no taxa satisfy all four criteria at once. ${replicateSummaryText}`; return; } const strongestEntry = significantTaxa // Rank intervention-linked taxa by the two signal-bearing contrasts in the final // framework: within-experimental response and between-group separation at time +4. .map((taxon) => ({ taxon, combinedEffect: Math.abs(taxon.comparisons.post_group.effect) + Math.abs(taxon.comparisons.intervention_time.effect) })) .sort((a, b) => b.combinedEffect - a.combinedEffect)[0]; const strongest = strongestEntry.taxon; summary.textContent = `This final framework combines Pre- group matching, control stability, experimental response, and time +4 separation. At the ${levelName} level, the clearest intervention-linked taxon is ${strongest.name} (${formatEnrichedSource(strongest.comparisons.intervention_time.enriched)} within the experimental arm, intervention effect ${formatEffect(strongest.comparisons.intervention_time.effect)}, time +4 separation ${formatEffect(strongest.comparisons.post_group.effect)}; q=${strongest.comparisons.intervention_time.q.toFixed(3)} and q=${strongest.comparisons.post_group.q.toFixed(3)}). ${replicateSummaryText}`; } levelSelect.addEventListener("change", function () { renderLevel(levelSelect.value, contrastSelect.value); }); contrastSelect.addEventListener("change", function () { renderLevel(levelSelect.value, contrastSelect.value); }); renderLevel(levelSelect.value, contrastSelect.value); }); </script> ::: --- ## R packages ```r install.packages(c("dplyr", "tidyr", "tibble", "forcats", "ggplot2", "vegan", "scales")) ``` ```r library(dplyr) library(tidyr) library(tibble) library(forcats) library(ggplot2) library(vegan) library(scales) ``` --- ## Step 1: Read a family-level MetaPhlAn table ```r metaphlan_raw <- read.delim("metaphlan_merged.tsv", check.names = FALSE, comment.char = "#") metadata <- read.delim("metadata.tsv", check.names = FALSE) family_long <- metaphlan_raw |> rename(feature = clade_name) |> filter(grepl("\\|f__[^|]+$", feature)) |> mutate( family = sub(".*\\|f__", "", feature), phylum = sub(".*\\|p__([^|]+).*", "\\1", feature), class = sub(".*\\|c__([^|]+).*", "\\1", feature), order = sub(".*\\|o__([^|]+).*", "\\1", feature) ) |> select(feature, phylum, class, order, family, everything()) |> pivot_longer( cols = -c(feature, phylum, class, order, family), names_to = "sample_id", values_to = "abundance" ) |> left_join(metadata, by = "sample_id") ``` If you have repeated rows at the same family level, aggregate them: ```r family_long <- family_long |> group_by(sample_id, group, timepoint, phylum, class, order, family) |> summarise(abundance = sum(abundance), .groups = "drop") ``` --- ## Step 2: Choose a phylogenetic order for families To make the legend and stacked bars feel more biological, do **not** sort families only by abundance. Instead, keep closely related families adjacent. A simple gut-oriented order that matches the intestinal tree page is: ```r phylo_order_tbl <- tibble( family = c( "Bifidobacteriaceae", "Bacteroidaceae", "Prevotellaceae", "Rikenellaceae", "Akkermansiaceae", "Lachnospiraceae", "Ruminococcaceae", "Peptostreptococcaceae", "Bacillaceae", "Staphylococcaceae", "Enterococcaceae", "Enterobacteriaceae", "Pseudomonadaceae", "Sutterellaceae", "Fusobacteriaceae" ), phylo_rank = seq_len(15) ) ``` If your study contains a different set of families, extend this table using the same logic: - sort by **phylum** first - then by **class/order** - then by **family** That gives you a legend that reads like a simplified phylogenetic tree rather than a random color list. --- ## Step 3: Keep the top families and lump the rest into "Other" ```r top_families <- family_long |> group_by(family) |> summarise(mean_abundance = mean(abundance), .groups = "drop") |> slice_max(mean_abundance, n = 15) |> pull(family) family_plot <- family_long |> mutate(family = if_else(family %in% top_families, family, "Other")) |> group_by(sample_id, group, timepoint, phylum, class, order, family) |> summarise(abundance = sum(abundance), .groups = "drop") ``` --- ## Step 4: Build a tree-matched family palette The phylum anchor colors below come directly from the tree navigator. Within each phylum, we create a short gradient so that related families share the same color family. ```r phylum_anchor_colors <- c( Actinomycetota = "#7e57c2", Bacteroidota = "#1976d2", Verrucomicrobiota = "#00897b", Bacillota = "#2e7d32", Pseudomonadota = "#c62828", Fusobacteriota = "#ef6c00" ) phylum_light_colors <- c( Actinomycetota = "#f5f0fb", Bacteroidota = "#f2f8fe", Verrucomicrobiota = "#f0fbfa", Bacillota = "#f0f8e8", Pseudomonadota = "#fff5f6", Fusobacteriota = "#fff8ee" ) family_key <- family_plot |> filter(family != "Other") |> distinct(phylum, class, order, family) |> left_join(phylo_order_tbl, by = "family") |> arrange(phylum, class, order, phylo_rank, family) family_palette <- family_key |> group_by(phylum) |> group_modify(~ { shades <- colorRampPalette(c( phylum_anchor_colors[.y$phylum], phylum_light_colors[.y$phylum] ))(nrow(.x)) mutate(.x, fill_color = shades) }) |> ungroup() |> select(family, fill_color) |> deframe() family_palette <- c(family_palette, Other = "#d9d9d9") ``` This produces a palette where: - **Bacteroidota families** stay blue - **Bacillota families** stay green - **Pseudomonadota families** stay red - and so on That way your stacked bar chart still communicates phylogeny at a glance. --- ## Step 5: Apply the phylogenetic order to the legend and stacks ```r family_levels <- c(family_key$family, "Other") family_plot <- family_plot |> mutate( family = factor(family, levels = family_levels), sample_id = factor(sample_id, levels = metadata$sample_id) ) ``` --- ## Step 6: Draw the stacked bar chart ```r ggplot(family_plot, aes(x = sample_id, y = abundance, fill = family)) + geom_col(width = 0.9) + facet_grid(. ~ group, scales = "free_x", space = "free_x") + scale_fill_manual(values = family_palette, drop = FALSE) + scale_y_continuous(labels = label_percent(scale = 1)) + labs( title = "Family-level relative abundance", subtitle = "Families are ordered by phylogenetic relatedness and colored by the tree navigator palette", x = NULL, y = "Relative abundance (%)", fill = "Family" ) + theme_minimal(base_size = 12) + theme( axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1), panel.grid.major.x = element_blank(), legend.position = "right" ) ``` ::: {.callout-tip} If your study is longitudinal, put **time on the x-axis** and facet by subject or treatment group. The same color/order logic still works. ::: --- ## How to show statistical significance without breaking the color logic If fill color already encodes **phylogeny**, avoid reusing the same fill for **significance**. That usually makes the plot harder to read. A better approach is: - keep bar colors for **taxonomy / phylogenetic relatedness** - add a **second panel** for significance or effect size - optionally add `*`, `**`, or arrows in the legend labels or strip labels ### Recommended option: a companion significance heatmap Suppose you already ran a model such as **MaAsLin2** or a paired longitudinal test and have a table with effect sizes and `qval` values. ```r stats_tbl <- read.delim("maaslin2/all_results.tsv", check.names = FALSE) |> filter(metadata == "timepoint") |> transmute( family = feature, effect = coef, qval = qval, sig_label = case_when( qval < 0.001 ~ "***", qval < 0.01 ~ "**", qval < 0.05 ~ "*", TRUE ~ "" ) ) |> filter(family %in% levels(family_plot$family)) |> mutate(family = factor(family, levels = rev(levels(family_plot$family)))) ggplot(stats_tbl, aes(x = 1, y = family, fill = effect)) + geom_tile(color = "white") + geom_text(aes(label = sig_label), fontface = "bold", size = 4) + scale_fill_gradient2(low = "#2166ac", mid = "white", high = "#b2182b") + labs( title = "Companion significance panel", subtitle = "Tile color = model effect size, star = significance threshold", x = NULL, y = NULL, fill = "Effect" ) + theme_minimal(base_size = 12) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) ``` This gives you a clean split: - the **stacked bar chart** answers *what is present, in what proportion, and how taxa are related* - the **heatmap** answers *which families differ significantly, and in which direction* ### Optional shortcut: add significance to legend labels If you want something lighter-weight, append stars to the labels of significant families. ```r sig_families <- stats_tbl |> filter(qval < 0.05) |> pull(family) legend_labels <- setNames( nm = levels(family_plot$family), object = ifelse(levels(family_plot$family) %in% sig_families, paste0(levels(family_plot$family), " *"), levels(family_plot$family)) ) ``` Then pass `labels = legend_labels` into `scale_fill_manual()`. This is simple, but the side-panel heatmap is usually easier to interpret. --- ## Recommended family-level plotting workflow 1. Start from a **family-level MetaPhlAn table**. 2. Join sample metadata. 3. Keep the **top 10-20 families** and lump the rest into **Other**. 4. Order families by **phylum → class → order → family**. 5. Use the **same phylum color anchors** as the intestinal tree navigator. 6. Use lighter within-phylum shades for family-level bars. 7. Show significance in a **companion plot**, not by replacing the phylogenetic fill colors. --- ## Related pages - [MetaPhlAn tutorial](../tools/metaphlan.qmd) - [Ecological Diversity tutorial](diversity.qmd) - [Intestinal microbiome tree](../microbiome/intestinal.qmd) - [MaAsLin2 tutorial](../tools/maaslin2.qmd) --- ## Tutorial: generate the standard volcano plot The interactive volcano plot above now uses a **standard volcano layout**: 1. compute a **log2 fold change** from the experimental-arm **Pre-** mean abundance to the **time +4** mean abundance, 2. compute a **Kruskal-Wallis p-value** for the same Pre- vs time +4 comparison, 3. plot `x = log2FC` and `y = -log10(p)`. Because this page is a small synthetic teaching example, the browser-side code below focuses on the standard volcano geometry. A LEfSe-style follow-up comparison can be layered in later if we decide to add that analysis. ```js const SIGNIFICANCE_THRESHOLD = 0.05; const MIN_DISPLAY_P_VALUE = 0.0005; const VOLCANO_LOG2FC_PSEUDOCOUNT = 0.01; function averageRanks(values) { const ranked = values .map((value, index) => ({ value, index })) .sort((left, right) => left.value - right.value); const ranks = new Array(values.length); let start = 0; while (start < ranked.length) { let end = start + 1; while (end < ranked.length && ranked[end].value === ranked[start].value) { end += 1; } const averageRank = (start + 1 + end) / 2; for (let rankIndex = start; rankIndex < end; rankIndex += 1) { ranks[ranked[rankIndex].index] = averageRank; } start = end; } return ranks; } function computeKruskalWallisPValue(groups) { const flattened = groups.flatMap((groupValues, groupIndex) => groupValues.map((value) => ({ value, groupIndex })) ); const ranks = averageRanks(flattened.map((entry) => entry.value)); const totalCount = flattened.length; const tieCounts = flattened.reduce((counts, entry) => { counts[entry.value] = (counts[entry.value] || 0) + 1; return counts; }, {}); const rankSums = groups.map(() => 0); flattened.forEach((entry, entryIndex) => { rankSums[entry.groupIndex] += ranks[entryIndex]; }); const statisticBase = groups.reduce((total, groupValues, groupIndex) => total + ((rankSums[groupIndex] ** 2) / groupValues.length), 0 ); const statistic = ((12 / (totalCount * (totalCount + 1))) * statisticBase) - (3 * (totalCount + 1)); const tieCorrectionDenominator = totalCount ** 3 - totalCount; const tieCorrection = tieCorrectionDenominator > 0 ? 1 - (Object.values(tieCounts).reduce((total, count) => total + (count ** 3 - count), 0) / tieCorrectionDenominator) : 1; return Math.max( MIN_DISPLAY_P_VALUE, chiSquareSurvivalDf1(tieCorrection > Number.EPSILON ? statistic / tieCorrection : statistic) ); } function computeLog2FoldChange(preValues, postValues) { if (preValues.length === 0 || postValues.length === 0) { return 0; } const preMean = mean(preValues); const postMean = mean(postValues); return Math.log2( (postMean + VOLCANO_LOG2FC_PSEUDOCOUNT) / (preMean + VOLCANO_LOG2FC_PSEUDOCOUNT) ); } const volcanoPoints = level.taxa.map((taxon) => { const experimentalPreValues = valuesForTaxonSamples(taxon, "Experimental", "0"); const experimentalPostValues = valuesForTaxonSamples(taxon, "Experimental", "+4"); const pValue = computeKruskalWallisPValue([experimentalPreValues, experimentalPostValues]); const log2FoldChange = computeLog2FoldChange(experimentalPreValues, experimentalPostValues); return { taxon, x: log2FoldChange, y: -Math.log10(pValue) }; }); const circles = volcanoPoints.map((point) => ` <circle cx="${xForLog2FoldChange(point.x)}" cy="${yForLogP(point.y)}" r="4.8" fill="${point.taxon.color}" stroke="${point.y >= -Math.log10(SIGNIFICANCE_THRESHOLD) ? "#0f172a" : "#94a3b8"}" ></circle> `); ``` In the live figure: - **horizontal position** is the experimental-arm log2 fold change, - **vertical position** is the Kruskal-Wallis `-log10(p)` value, - the dashed horizontal guide marks the conventional `p = 0.05` threshold.