Microbiome for Dummies

A beginner-friendly guide to microbiome bioinformatic tools

Welcome

This website is a beginner-friendly reference guide for the most commonly used microbiome bioinformatics tools, with a focus on the Biobakery suite developed by the Huttenhower and Segata labs.

Whether you’re new to microbiome research or just need a quick reference, this site walks you through what each tool does, how to install it, and how to run it.

What is the Microbiome?

The microbiome is the collection of all microorganisms (bacteria, fungi, viruses, archaea) living in a particular environment. In human health research, the gut microbiome has attracted enormous attention due to its role in immunity, metabolism, and disease.

Studying the microbiome relies on sequencing DNA (or RNA) from environmental samples and then using computational tools to answer questions like:

Who is there? (taxonomic profiling)
What are they doing? (functional profiling)
Are community shifts associated with disease? (statistical analysis)

The Biobakery Suite

The Biobakery suite is a collection of tools for end-to-end microbiome analysis maintained by the Huttenhower Lab (Harvard) and the Segata Lab (University of Trento). These tools cover the major analysis steps from raw sequence data to biological interpretation.

Functional & Taxonomic Profiling

HUMAnN — Functional profiling of metagenomes
MetaPhlAn — Taxonomic profiling from shotgun metagenomics
PICRUSt2 — Predict metagenome function from 16S data
PhyloPhlAn — Phylogenetic placement and genome characterization
StrainPhlAn — Strain-level metagenomic profiling

Advanced Analysis

ShortBRED — Identify and quantify protein families in metagenomes
WAAFLE — Detect horizontal gene transfer events
MACARRoN — Metabolome prioritization
metawibele — Microbial protein function characterization
baqlava — Viral profiling from metagenomes
MaAsLin2 — Multivariable association discovery
anpan — Microbial pan-genome statistical models
CCREPE — Compositional data correlation

A Typical Microbiome Workflow

Which path you take depends heavily on whether you start with 16S/amplicon reads or shotgun metagenomes. Select a workflow type to view the corresponding diagram.

16S / amplicon workflow

flowchart TD
    A([Raw 16S / ITS reads]) --> B[QIIME 2\nimport, denoise, taxonomy]
    B --> C[ASV table + taxonomy]
    B --> D[Diversity metrics + ordination]
    C --> E[PICRUSt2\npredict gene families + pathways]
    C --> F[taxUMAP / CCREPE\ncommunity structure and co-occurrence]
    C --> G[MaAsLin2\ntaxon-metadata associations]
    E --> H[Predicted KO / EC / pathway tables]
    H --> I[MaAsLin2\npredicted function associations]
    H --> J[Paired metabolomics table]
    J --> K[Microbe-function-metabolite hypotheses]

Best for: community composition, diversity, and low-cost functional hypotheses when you only have amplicon data.

Shotgun metagenome workflow

flowchart TD
    A([Raw shotgun reads]) --> B[KneadData\nquality control + host removal]
    B --> C[MetaPhlAn\nspecies profile]
    B --> D[HUMAnN\ngene families + pathways]
    B --> E[ShortBRED\ntarget protein families]
    B --> F[baqlava\nviral profile]
    B --> G[Assembly / MAGs\ne.g., MEGAHIT · MetaBAT2]
    C --> H[StrainPhlAn\nstrain tracking]
    C --> I[taxUMAP\nordination]
    C --> J[MaAsLin2\ntaxon-metadata associations]
    D --> K[MaAsLin2\npathway + functional associations]
    D --> L[MACARRoN + metabolomics\nmicrobe-metabolite prioritization]
    E --> M[AMR / enzyme family abundance]
    G --> N[metawibele / WAAFLE / PhyloPhlAn\nnovel proteins, HGT, phylogeny]

Best for: direct measurement of taxonomy, pathways, protein families, viruses, strain variation, and metabolomics-linked functional readouts.

Tool Summary Table

The table below gives a quick at-a-glance comparison of every tool covered on this site — what it does, why you would reach for it, what data it accepts, where it typically sits in the workflow, and whether it is most relevant to a 16S/amplicon or shotgun sequencing (SGS) workflow. Use the buttons to toggle between the two sequencing modes; tools tagged Both / downstream remain visible in either view because they are commonly used after either type of profiling.

For shotgun studies, most analyses listed here assume you have already done read cleaning with a preprocessing step such as KneadData, leaving host-depleted reads ready for profiling, assembly, or strain analysis.

Tool	Best fit workflow	Category	Input data	Workflow role & hand-off	Purpose & Why use it	Prominent Use Case
HUMAnN	Shotgun	Functional profiling	Shotgun metagenomics / metatranscriptomics (.fastq / .fastq.gz)	Starts once cleaned reads are ready, often after or alongside MetaPhlAn; hands off gene family and pathway tables to association testing, metabolomics integration, or biological interpretation.	Profiles microbial gene family and metabolic pathway abundances. Gold-standard for measuring what the community is doing at the metabolic level; stratified output links functions back to specific species.	Franzosa EA et al. Nature Methods 2018
MetaPhlAn	Shotgun	Taxonomic profiling	Shotgun metagenomics (.fastq / .fastq.gz)	Usually the first analytical step after KneadData-style read cleaning; hands off species profiles to HUMAnN, StrainPhlAn, taxUMAP, MaAsLin2, or other downstream follow-up.	Determines which species/strains are present and at what relative abundance. Fast, reference-based, and highly accurate; produces profiles usable by HUMAnN, StrainPhlAn, and many downstream tools.	Blanco-Míguez A et al. Nature Biotechnology 2023
PICRUSt2	16S / amplicon	Functional prediction (16S)	16S rRNA amplicon ASVs (.biom / .tsv)	Starts after denoising and taxonomy assignment in a QIIME 2-style workflow; hands off predicted EC, KO, and pathway tables to MaAsLin2 or pathway-focused interpretation.	Predicts functional gene and pathway content from 16S data. Extracts functional information from 16S surveys without shotgun sequencing; best option when only amplicon data are available.	Douglas GM et al. Nature Biotechnology 2020
PhyloPhlAn	Shotgun	Phylogenetics	Whole genomes / MAGs (.fasta / .fna)	Starts once genomes or MAGs have been assembled; hands off placements and reference trees to taxonomic interpretation, comparative genomics, or genome-centered follow-up.	Places new genomes on a reference tree and assigns taxonomy. Resolves the evolutionary position of novel or draft genomes using universal marker genes; backbone of MetaPhlAn’s SGB taxonomy.	Asnicar F et al. Nature Communications 2020
StrainPhlAn	Shotgun	Strain-level tracking	Shotgun metagenomics (.fastq / .fastq.gz)	Starts after MetaPhlAn and marker extraction identify a species with sufficient coverage; hands off strain trees to transmission, persistence, or within-host evolution analyses.	Tracks specific microbial strains across samples via phylogenetic trees. Answers whether the same strain is shared between individuals (e.g. mother-infant transmission) or persists across time points.	Truong DT et al. Genome Research 2017
ShortBRED	Shotgun	Targeted gene profiling	Shotgun metagenomics (.fastq / .fastq.gz)	Starts with cleaned reads plus a predefined marker set for the protein family of interest; hands off targeted abundance tables to focused statistical or mechanistic follow-up.	Builds compact marker sequences for target proteins then quantifies them in metagenomes. Efficiently screens large cohorts for any protein of interest (e.g. antimicrobial resistance genes) without full-database alignment.	Lloyd-Price J et al. Nature 2019
WAAFLE	Shotgun	Horizontal gene transfer	Metagenomic assemblies — contigs (.fasta / .fna)	Starts after contigs are assembled from shotgun data; hands off candidate HGT events for genome-context inspection and mobile-element follow-up.	Detects lateral gene transfer events in assembled metagenomes. Identifies genes that appear to have moved between phylogenetically distant lineages, revealing mobile genetic elements in communities.	Hsu TY et al. Nature Microbiology 2025
MACARRoN	Both / downstream	Metabolomics prioritization	Untargeted LC-MS metabolomics (.csv / .tsv)	Starts once paired microbiome and metabolomics tables have been assembled; hands off ranked metabolite candidates to manual review, validation, and targeted experiments.	Ranks metabolites by biological relevance and microbiome association. Cuts through thousands of unannotated metabolite features to surface the ones most worth experimental follow-up once you have paired microbiome and metabolomics data.	Bhosle A et al. Molecular Systems Biology 2024
metawibele	Shotgun	Protein characterization	Metagenomic assemblies (.fasta / .fna)	Starts after assembly or MAG recovery; hands off prioritized novel protein families to annotation curation, experimental follow-up, or comparative genomics.	Annotates and prioritizes novel microbial protein families. Multi-database annotation pipeline that highlights unannotated or poorly characterized proteins likely to have biological significance.	Zhang Y et al. Nature 2022
baqlava	Shotgun	Viral profiling	Shotgun metagenomics (.fastq / .fastq.gz)	Starts with cleaned shotgun reads; hands off viral abundance profiles to comparison with bacterial taxa, host phenotypes, and multi-omics readouts.	Identifies and quantifies viruses (especially bacteriophages) alongside bacteria. Adds the viral dimension to standard metagenomics; designed to complement MetaPhlAn for a complete community profile.	Jensen JSL et al. bioRxiv 2026
MaAsLin2	Both / downstream	Statistical association	Any multi-omics table + metadata (.tsv / .csv)	Starts once feature tables and metadata are ready; hands off effect sizes, q-values, and model summaries to figures, interpretation, and reporting.	Finds microbial features significantly associated with host/environmental variables. Handles compositionality, sparsity, and confounders in a single multivariable model; the go-to tool for differential abundance analysis.	Mallick H et al. PLOS Computational Biology 2021
anpan	Shotgun	Pan-genome statistics	Gene presence/absence or SNP tables (per species) + metadata (.tsv / .csv)	Starts after species-resolved gene or SNP tables have been derived from shotgun data; hands off within-species association hits to mechanistic follow-up and validation.	Tests associations between within-species genetic variation and phenotypes. Goes beyond community-level analyses to ask whether specific microbial genes within a species drive a phenotype, with built-in phylogenetic correction.	Ghazi AR et al. bioRxiv 2025
CCREPE	Both / downstream	Co-occurrence / correlation	Any compositional abundance table (.tsv / .csv)	Starts with a normalized compositional abundance table; hands off corrected correlation results to co-occurrence networks and ecological interpretation.	Computes statistically corrected correlations between microbial features. Solves the compositional bias problem in microbiome correlations; essential for building reliable co-occurrence networks.	HMP Consortium Nature 2012
taxUMAP	Both / downstream	Visualisation	Any species/ASV relative abundance table + taxonomy (.tsv / .csv)	Starts with taxonomically annotated ASV or species tables from either workflow; hands off publication-ready embeddings and exploratory figures for interpretation and reporting.	Produces taxonomy-aware UMAP embeddings of microbiome community composition. Captures biologically meaningful community structure that standard UMAP misses by aggregating abundances up the taxonomic tree before computing distances.	Schluter J et al. Cell Host & Microbe 2023
PolyPanner	Shotgun	Intra-species evolution	Longitudinal metagenomic reads + co-assembly (.fastq / .fasta)	Starts after a focal species has been defined in a longitudinal shotgun cohort and co-assembly is available; hands off variant-frequency trajectories to within-host evolution interpretation.	Detects dynamic polymorphic variants whose allele frequencies change across a time-series of metagenomes. Specifically designed for longitudinal cohorts: leverages co-assembly to improve accuracy and tests for frequency change, enabling detection of de novo selective sweeps that cross-sectional tools miss.	Yaffe E et al. Nature 2025
QIIME 2	16S / amplicon	Amplicon analysis platform	16S / ITS amplicon reads (.fastq / .fastq.gz)	Usually the first major analytical step for raw amplicon reads; hands off ASV tables, taxonomy, diversity metrics, and ordinations to PICRUSt2, taxUMAP, MaAsLin2, or direct reporting.	End-to-end amplicon microbiome analysis: denoising, taxonomy, diversity, ordination, and differential abundance. Gold-standard reproducible platform with full data provenance, an extensive plugin ecosystem, and the largest user community in amplicon-based microbiome research.	Bolyen E et al. Nature Biotechnology 2019

How to Use This Site

Each tool page provides:

What it does — Plain-language description of the tool’s purpose
When to use it — Where it fits in a typical microbiome workflow
Installation — How to install the tool (conda, pip, Docker)
Basic usage — Example command-line invocation
Output — Description of key output files
Tips & Gotchas — Common pitfalls for beginners
Further reading — Links to documentation and key papers

Getting Help

--- title: "Microbiome for Dummies" subtitle: "A beginner-friendly guide to microbiome bioinformatic tools" --- ## Welcome This website is a beginner-friendly reference guide for the most commonly used microbiome bioinformatics tools, with a focus on the [Biobakery](https://huttenhower.sph.harvard.edu/tools/) suite developed by the Huttenhower and Segata labs. Whether you're new to microbiome research or just need a quick reference, this site walks you through what each tool does, how to install it, and how to run it. --- ## What is the Microbiome? The **microbiome** is the collection of all microorganisms (bacteria, fungi, viruses, archaea) living in a particular environment. In human health research, the gut microbiome has attracted enormous attention due to its role in immunity, metabolism, and disease. Studying the microbiome relies on sequencing DNA (or RNA) from environmental samples and then using computational tools to answer questions like: - **Who is there?** (taxonomic profiling) - **What are they doing?** (functional profiling) - **Are community shifts associated with disease?** (statistical analysis) --- ## The Biobakery Suite The [Biobakery](https://huttenhower.sph.harvard.edu/tools/) suite is a collection of tools for end-to-end microbiome analysis maintained by the Huttenhower Lab (Harvard) and the Segata Lab (University of Trento). These tools cover the major analysis steps from raw sequence data to biological interpretation. ::: {.grid} ::: {.g-col-6} ### Functional & Taxonomic Profiling - [**HUMAnN**](tools/humann.qmd) — Functional profiling of metagenomes - [**MetaPhlAn**](tools/metaphlan.qmd) — Taxonomic profiling from shotgun metagenomics - [**PICRUSt2**](tools/picrust2.qmd) — Predict metagenome function from 16S data - [**PhyloPhlAn**](tools/phylophlan.qmd) — Phylogenetic placement and genome characterization - [**StrainPhlAn**](tools/strainphlan.qmd) — Strain-level metagenomic profiling ::: ::: {.g-col-6} ### Advanced Analysis - [**ShortBRED**](tools/shortbred.qmd) — Identify and quantify protein families in metagenomes - [**WAAFLE**](tools/waafle.qmd) — Detect horizontal gene transfer events - [**MACARRoN**](tools/macarron.qmd) — Metabolome prioritization - [**metawibele**](tools/metawibele.qmd) — Microbial protein function characterization - [**baqlava**](tools/baqlava.qmd) — Viral profiling from metagenomes - [**MaAsLin2**](tools/maaslin2.qmd) — Multivariable association discovery - [**anpan**](tools/anpan.qmd) — Microbial pan-genome statistical models - [**CCREPE**](tools/ccrepe.qmd) — Compositional data correlation ::: ::: --- ## A Typical Microbiome Workflow Which path you take depends heavily on whether you start with **16S/amplicon reads** or **shotgun metagenomes**. Select a workflow type to view the corresponding diagram. <div class="workflow-selector" role="group" aria-label="Select workflow diagram"> <button type="button" class="workflow-select-button is-active" data-target="workflow-16s" aria-pressed="true">16S / Amplicon</button> <button type="button" class="workflow-select-button" data-target="workflow-shotgun" aria-pressed="false">Shotgun Metagenomics</button> </div> ::: {#workflow-16s .workflow-panel} ### 16S / amplicon workflow ::: {.workflow-diagram} ```{mermaid} flowchart TD A([Raw 16S / ITS reads]) --> B[QIIME 2\nimport, denoise, taxonomy] B --> C[ASV table + taxonomy] B --> D[Diversity metrics + ordination] C --> E[PICRUSt2\npredict gene families + pathways] C --> F[taxUMAP / CCREPE\ncommunity structure and co-occurrence] C --> G[MaAsLin2\ntaxon-metadata associations] E --> H[Predicted KO / EC / pathway tables] H --> I[MaAsLin2\npredicted function associations] H --> J[Paired metabolomics table] J --> K[Microbe-function-metabolite hypotheses] ``` ::: **Best for:** community composition, diversity, and low-cost functional hypotheses when you only have amplicon data. ::: ::: {#workflow-shotgun .workflow-panel .workflow-panel-hidden} ### Shotgun metagenome workflow ::: {.workflow-diagram} ```{mermaid} flowchart TD A([Raw shotgun reads]) --> B[KneadData\nquality control + host removal] B --> C[MetaPhlAn\nspecies profile] B --> D[HUMAnN\ngene families + pathways] B --> E[ShortBRED\ntarget protein families] B --> F[baqlava\nviral profile] B --> G[Assembly / MAGs\ne.g., MEGAHIT · MetaBAT2] C --> H[StrainPhlAn\nstrain tracking] C --> I[taxUMAP\nordination] C --> J[MaAsLin2\ntaxon-metadata associations] D --> K[MaAsLin2\npathway + functional associations] D --> L[MACARRoN + metabolomics\nmicrobe-metabolite prioritization] E --> M[AMR / enzyme family abundance] G --> N[metawibele / WAAFLE / PhyloPhlAn\nnovel proteins, HGT, phylogeny] ``` ::: **Best for:** direct measurement of taxonomy, pathways, protein families, viruses, strain variation, and metabolomics-linked functional readouts. ::: <script> document.addEventListener("DOMContentLoaded", function () { const buttons = document.querySelectorAll(".workflow-select-button"); const panels = document.querySelectorAll(".workflow-panel"); buttons.forEach(function (btn) { btn.addEventListener("click", function () { const target = btn.dataset.target; buttons.forEach(function (b) { const isActive = b === btn; b.classList.toggle("is-active", isActive); b.setAttribute("aria-pressed", isActive ? "true" : "false"); }); panels.forEach(function (panel) { if (panel.id === target) { panel.classList.remove("workflow-panel-hidden"); } else { panel.classList.add("workflow-panel-hidden"); } }); }); }); }); </script> ## Tool Summary Table The table below gives a quick at-a-glance comparison of every tool covered on this site — what it does, why you would reach for it, what data it accepts, where it typically sits in the workflow, and whether it is most relevant to a **16S/amplicon** or **shotgun sequencing (SGS)** workflow. Use the buttons to toggle between the two sequencing modes; tools tagged **Both / downstream** remain visible in either view because they are commonly used after either type of profiling. For shotgun studies, most analyses listed here assume you have already done read cleaning with a preprocessing step such as **KneadData**, leaving host-depleted reads ready for profiling, assembly, or strain analysis. <div class="workflow-filter-controls" role="group" aria-label="Filter tools by sequencing workflow"> <button type="button" class="workflow-filter-button is-active" data-filter="all" aria-pressed="true">Show all tools</button> <button type="button" class="workflow-filter-button" data-filter="16s" aria-pressed="false">16S / amplicon workflow</button> <button type="button" class="workflow-filter-button" data-filter="shotgun" aria-pressed="false">Shotgun workflow</button> </div> <table class="table table-striped table-hover sequencing-summary-table"> <thead> <tr> <th scope="col">Tool</th> <th scope="col">Best fit workflow</th> <th scope="col">Category</th> <th scope="col">Input data</th> <th scope="col">Workflow role & hand-off</th> <th scope="col">Purpose & Why use it</th> <th scope="col">Prominent Use Case</th> </tr> </thead> <tbody> <tr data-workflow="shotgun"> <td><a href="tools/humann.qmd"><strong>HUMAnN</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Functional profiling</td> <td>Shotgun metagenomics / metatranscriptomics (.fastq / .fastq.gz)</td> <td>Starts once cleaned reads are ready, often after or alongside MetaPhlAn; hands off gene family and pathway tables to association testing, metabolomics integration, or biological interpretation.</td> <td>Profiles microbial gene family and metabolic pathway abundances. Gold-standard for measuring <em>what the community is doing</em> at the metabolic level; stratified output links functions back to specific species.</td> <td><a href="https://www.nature.com/articles/s41592-018-0176-y">Franzosa EA et al. <em>Nature Methods</em> 2018</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/metaphlan.qmd"><strong>MetaPhlAn</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Taxonomic profiling</td> <td>Shotgun metagenomics (.fastq / .fastq.gz)</td> <td>Usually the first analytical step after KneadData-style read cleaning; hands off species profiles to HUMAnN, StrainPhlAn, taxUMAP, MaAsLin2, or other downstream follow-up.</td> <td>Determines which species/strains are present and at what relative abundance. Fast, reference-based, and highly accurate; produces profiles usable by HUMAnN, StrainPhlAn, and many downstream tools.</td> <td><a href="https://www.nature.com/articles/s41587-023-01688-w">Blanco-Míguez A et al. <em>Nature Biotechnology</em> 2023</a></td> </tr> <tr data-workflow="16s"> <td><a href="tools/picrust2.qmd"><strong>PICRUSt2</strong></a></td> <td><span class="workflow-badge workflow-badge-16s">16S / amplicon</span></td> <td>Functional prediction (16S)</td> <td>16S rRNA amplicon ASVs (.biom / .tsv)</td> <td>Starts after denoising and taxonomy assignment in a QIIME 2-style workflow; hands off predicted EC, KO, and pathway tables to MaAsLin2 or pathway-focused interpretation.</td> <td>Predicts functional gene and pathway content from 16S data. Extracts functional information from 16S surveys without shotgun sequencing; best option when only amplicon data are available.</td> <td><a href="https://www.nature.com/articles/s41587-020-0548-6">Douglas GM et al. <em>Nature Biotechnology</em> 2020</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/phylophlan.qmd"><strong>PhyloPhlAn</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Phylogenetics</td> <td>Whole genomes / MAGs (.fasta / .fna)</td> <td>Starts once genomes or MAGs have been assembled; hands off placements and reference trees to taxonomic interpretation, comparative genomics, or genome-centered follow-up.</td> <td>Places new genomes on a reference tree and assigns taxonomy. Resolves the evolutionary position of novel or draft genomes using universal marker genes; backbone of MetaPhlAn's SGB taxonomy.</td> <td><a href="https://www.nature.com/articles/s41467-020-16366-7">Asnicar F et al. <em>Nature Communications</em> 2020</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/strainphlan.qmd"><strong>StrainPhlAn</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Strain-level tracking</td> <td>Shotgun metagenomics (.fastq / .fastq.gz)</td> <td>Starts after MetaPhlAn and marker extraction identify a species with sufficient coverage; hands off strain trees to transmission, persistence, or within-host evolution analyses.</td> <td>Tracks specific microbial strains across samples via phylogenetic trees. Answers whether the <em>same strain</em> is shared between individuals (e.g. mother-infant transmission) or persists across time points.</td> <td><a href="https://genome.cshlp.org/content/27/4/626">Truong DT et al. <em>Genome Research</em> 2017</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/shortbred.qmd"><strong>ShortBRED</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Targeted gene profiling</td> <td>Shotgun metagenomics (.fastq / .fastq.gz)</td> <td>Starts with cleaned reads plus a predefined marker set for the protein family of interest; hands off targeted abundance tables to focused statistical or mechanistic follow-up.</td> <td>Builds compact marker sequences for target proteins then quantifies them in metagenomes. Efficiently screens large cohorts for any protein of interest (e.g. antimicrobial resistance genes) without full-database alignment.</td> <td><a href="https://www.nature.com/articles/s41586-019-1237-9">Lloyd-Price J et al. <em>Nature</em> 2019</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/waafle.qmd"><strong>WAAFLE</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Horizontal gene transfer</td> <td>Metagenomic assemblies — contigs (.fasta / .fna)</td> <td>Starts after contigs are assembled from shotgun data; hands off candidate HGT events for genome-context inspection and mobile-element follow-up.</td> <td>Detects lateral gene transfer events in assembled metagenomes. Identifies genes that appear to have moved between phylogenetically distant lineages, revealing mobile genetic elements in communities.</td> <td><a href="https://doi.org/10.1038/s41564-024-01881-w">Hsu TY et al. <em>Nature Microbiology</em> 2025</a></td> </tr> <tr data-workflow="both"> <td><a href="tools/macarron.qmd"><strong>MACARRoN</strong></a></td> <td><span class="workflow-badge workflow-badge-both">Both / downstream</span></td> <td>Metabolomics prioritization</td> <td>Untargeted LC-MS metabolomics (.csv / .tsv)</td> <td>Starts once paired microbiome and metabolomics tables have been assembled; hands off ranked metabolite candidates to manual review, validation, and targeted experiments.</td> <td>Ranks metabolites by biological relevance and microbiome association. Cuts through thousands of unannotated metabolite features to surface the ones most worth experimental follow-up once you have paired microbiome and metabolomics data.</td> <td><a href="https://doi.org/10.1038/s44320-024-00027-8">Bhosle A et al. <em>Molecular Systems Biology</em> 2024</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/metawibele.qmd"><strong>metawibele</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Protein characterization</td> <td>Metagenomic assemblies (.fasta / .fna)</td> <td>Starts after assembly or MAG recovery; hands off prioritized novel protein families to annotation curation, experimental follow-up, or comparative genomics.</td> <td>Annotates and prioritizes novel microbial protein families. Multi-database annotation pipeline that highlights unannotated or poorly characterized proteins likely to have biological significance.</td> <td><a href="https://doi.org/10.1038/s41586-022-04648-7">Zhang Y et al. <em>Nature</em> 2022</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/baqlava.qmd"><strong>baqlava</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Viral profiling</td> <td>Shotgun metagenomics (.fastq / .fastq.gz)</td> <td>Starts with cleaned shotgun reads; hands off viral abundance profiles to comparison with bacterial taxa, host phenotypes, and multi-omics readouts.</td> <td>Identifies and quantifies viruses (especially bacteriophages) alongside bacteria. Adds the viral dimension to standard metagenomics; designed to complement MetaPhlAn for a complete community profile.</td> <td><a href="https://doi.org/10.64898/2026.02.11.705346">Jensen JSL et al. <em>bioRxiv</em> 2026</a></td> </tr> <tr data-workflow="both"> <td><a href="tools/maaslin2.qmd"><strong>MaAsLin2</strong></a></td> <td><span class="workflow-badge workflow-badge-both">Both / downstream</span></td> <td>Statistical association</td> <td>Any multi-omics table + metadata (.tsv / .csv)</td> <td>Starts once feature tables and metadata are ready; hands off effect sizes, q-values, and model summaries to figures, interpretation, and reporting.</td> <td>Finds microbial features significantly associated with host/environmental variables. Handles compositionality, sparsity, and confounders in a single multivariable model; the go-to tool for differential abundance analysis.</td> <td><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009442">Mallick H et al. <em>PLOS Computational Biology</em> 2021</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/anpan.qmd"><strong>anpan</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Pan-genome statistics</td> <td>Gene presence/absence or SNP tables (per species) + metadata (.tsv / .csv)</td> <td>Starts after species-resolved gene or SNP tables have been derived from shotgun data; hands off within-species association hits to mechanistic follow-up and validation.</td> <td>Tests associations between within-species genetic variation and phenotypes. Goes beyond community-level analyses to ask whether <em>specific microbial genes</em> within a species drive a phenotype, with built-in phylogenetic correction.</td> <td><a href="https://www.biorxiv.org/content/10.1101/2025.01.06.631550v1">Ghazi AR et al. <em>bioRxiv</em> 2025</a></td> </tr> <tr data-workflow="both"> <td><a href="tools/ccrepe.qmd"><strong>CCREPE</strong></a></td> <td><span class="workflow-badge workflow-badge-both">Both / downstream</span></td> <td>Co-occurrence / correlation</td> <td>Any compositional abundance table (.tsv / .csv)</td> <td>Starts with a normalized compositional abundance table; hands off corrected correlation results to co-occurrence networks and ecological interpretation.</td> <td>Computes statistically corrected correlations between microbial features. Solves the compositional bias problem in microbiome correlations; essential for building reliable co-occurrence networks.</td> <td><a href="https://www.nature.com/articles/nature11234">HMP Consortium <em>Nature</em> 2012</a></td> </tr> <tr data-workflow="both"> <td><a href="tools/taxumap.qmd"><strong>taxUMAP</strong></a></td> <td><span class="workflow-badge workflow-badge-both">Both / downstream</span></td> <td>Visualisation</td> <td>Any species/ASV relative abundance table + taxonomy (.tsv / .csv)</td> <td>Starts with taxonomically annotated ASV or species tables from either workflow; hands off publication-ready embeddings and exploratory figures for interpretation and reporting.</td> <td>Produces taxonomy-aware UMAP embeddings of microbiome community composition. Captures biologically meaningful community structure that standard UMAP misses by aggregating abundances up the taxonomic tree before computing distances.</td> <td><a href="https://doi.org/10.1016/j.chom.2023.05.027">Schluter J et al. <em>Cell Host & Microbe</em> 2023</a></td> </tr> <tr data-workflow="shotgun"> <td><a href="tools/polypanner.qmd"><strong>PolyPanner</strong></a></td> <td><span class="workflow-badge workflow-badge-shotgun">Shotgun</span></td> <td>Intra-species evolution</td> <td>Longitudinal metagenomic reads + co-assembly (.fastq / .fasta)</td> <td>Starts after a focal species has been defined in a longitudinal shotgun cohort and co-assembly is available; hands off variant-frequency trajectories to within-host evolution interpretation.</td> <td>Detects dynamic polymorphic variants whose allele frequencies change across a time-series of metagenomes. Specifically designed for longitudinal cohorts: leverages co-assembly to improve accuracy and tests for frequency <em>change</em>, enabling detection of de novo selective sweeps that cross-sectional tools miss.</td> <td><a href="https://doi.org/10.1038/s41586-025-08781-x">Yaffe E et al. <em>Nature</em> 2025</a></td> </tr> <tr data-workflow="16s"> <td><a href="tools/qiime2.qmd"><strong>QIIME 2</strong></a></td> <td><span class="workflow-badge workflow-badge-16s">16S / amplicon</span></td> <td>Amplicon analysis platform</td> <td>16S / ITS amplicon reads (.fastq / .fastq.gz)</td> <td>Usually the first major analytical step for raw amplicon reads; hands off ASV tables, taxonomy, diversity metrics, and ordinations to PICRUSt2, taxUMAP, MaAsLin2, or direct reporting.</td> <td>End-to-end amplicon microbiome analysis: denoising, taxonomy, diversity, ordination, and differential abundance. Gold-standard reproducible platform with full data provenance, an extensive plugin ecosystem, and the largest user community in amplicon-based microbiome research.</td> <td><a href="https://doi.org/10.1038/s41587-019-0209-9">Bolyen E et al. <em>Nature Biotechnology</em> 2019</a></td> </tr> </tbody> </table> <script> document.addEventListener("DOMContentLoaded", function () { const controls = document.querySelectorAll(".workflow-filter-button"); const rows = document.querySelectorAll(".sequencing-summary-table tbody tr"); const isVisible = (rowWorkflow, selectedWorkflow) => { if (selectedWorkflow === "all") { return true; } return rowWorkflow === selectedWorkflow || rowWorkflow === "both"; }; controls.forEach((button) => { button.addEventListener("click", function () { const selectedWorkflow = button.dataset.filter; controls.forEach((control) => { const isActive = control === button; control.classList.toggle("is-active", isActive); control.setAttribute("aria-pressed", isActive ? "true" : "false"); }); rows.forEach((row) => { row.hidden = !isVisible(row.dataset.workflow, selectedWorkflow); }); }); }); }); </script> --- ## How to Use This Site Each tool page provides: 1. **What it does** — Plain-language description of the tool's purpose 2. **When to use it** — Where it fits in a typical microbiome workflow 3. **Installation** — How to install the tool (conda, pip, Docker) 4. **Basic usage** — Example command-line invocation 5. **Output** — Description of key output files 6. **Tips & Gotchas** — Common pitfalls for beginners 7. **Further reading** — Links to documentation and key papers --- ## Getting Help - 🌐 [Biobakery Wiki](https://github.com/biobakery/biobakery/wiki) - 💬 [Biobakery Forum](https://forum.biobakery.org/) - 📖 [Huttenhower Lab](https://huttenhower.sph.harvard.edu/) - 📖 [Segata Lab](https://segatalab.cibio.unitn.it/)