MaAsLin2

Multivariable Association Discovery in Population-scale Meta-omics Studies

What is MaAsLin2?

MaAsLin2 (Multivariable Association Discovery in Population-scale Meta-omics Studies) is a statistical tool for identifying associations between multi-omics features (e.g., microbial taxa, metabolites, gene families) and sample metadata. It handles the challenges specific to microbiome data: compositionality, sparsity, overdispersion, and confounding variables.

MaAsLin2 answers the question: β€œWhich microbial features are significantly associated with my variable of interest, accounting for potential confounders?”


When to Use MaAsLin2

Use MaAsLin2 when you want to:

  • Find microbiome features associated with a disease, treatment, or other phenotype
  • Account for covariates (age, sex, BMI, batch effects) in association testing
  • Analyze any type of multi-omics data: metagenomics, metabolomics, proteomics
  • Run multivariable (not just univariate) association tests

Installation

From GitHub

devtools::install_github("biobakery/maaslin2")

Command-line version

conda install -c biobakery maaslin2

Basic Usage

In R

library(Maaslin2)

# Load data
features <- read.table("species_abundance.tsv",
                        sep = "\t",
                        header = TRUE,
                        row.names = 1)

metadata <- read.table("metadata.tsv",
                        sep = "\t",
                        header = TRUE,
                        row.names = 1)

# Run MaAsLin2
fit_data <- Maaslin2(
  input_data     = features,
  input_metadata = metadata,
  output         = "maaslin2_output/",
  fixed_effects  = c("diagnosis"),
  random_effects = c("subject")
)

Command-line

Maaslin2.R \
  species_abundance.tsv \
  metadata.tsv \
  maaslin2_output/ \
  --fixed_effects "diagnosis" \
  --random_effects "subject"

Key Parameters

Parameter Description
fixed_effects Variables to test for association
random_effects Random effects to account for repeated measures
normalization Normalization method: "TSS", "CLR", "CSS", "NONE"
transform Data transformation: "LOG", "LOGIT", "AST", "NONE"
analysis_method Statistical model: "LM", "CPLM", "ZICP", "NEGBIN", "ZINB"
min_abundance Minimum abundance threshold for filtering
min_prevalence Minimum prevalence threshold for filtering

Output Files

File Contents
all_results.tsv Full results for all features and associations
significant_results.tsv Filtered results (q-value < 0.25 by default)
figures/ Visualizations of significant associations

Understanding results

# Read significant results
results <- read.table("maaslin2_output/significant_results.tsv",
                       sep = "\t",
                       header = TRUE)

# Key columns:
# feature     - the microbiome feature (taxon, gene family, etc.)
# metadata    - the metadata variable
# coef        - regression coefficient (log fold-change)
# stderr      - standard error
# pval        - raw p-value
# qval        - FDR-corrected p-value (Benjamini-Hochberg)

Statistical Models

MaAsLin2 supports multiple statistical models suited to different data types:

Model Code Best for
Linear model "LM" Log-transformed continuous data
Compound Poisson "CPLM" Zero-inflated, non-negative data
Zero-inflated CP "ZICP" Highly sparse data
Negative Binomial "NEGBIN" Count data
Zero-inflated NB "ZINB" Sparse count data

Tips & Gotchas

Tip

Default settings work well for most microbiome data β€” The defaults (TSS normalization + log transform + linear model) are appropriate for most 16S or metagenomic relative abundance data.

Warning

Compositional data β€” Microbial relative abundance data is compositional (sums to 1). CLR transformation (normalization = "CLR") is more appropriate for compositional data than simple log-transform.

Tip

Multiple covariates β€” Always include relevant covariates (age, sex, BMI, batch) in fixed_effects. Not doing so can produce spurious associations.

Warning

Interpret q-values, not p-values β€” With hundreds of features tested, always use the FDR-corrected qval column. The default significance threshold is q < 0.25.


Further Reading