CCREPE
Compositionality Corrected by REnormalization and PErmutation
What is CCREPE?
CCREPE (Compositionality Corrected by REnormalization and PErmutation) is an R package for measuring and assessing the statistical significance of correlations in compositional microbiome data. Standard correlation methods (Pearson, Spearman) produce biased results with compositional data because the values sum to a constant. CCREPE corrects for this bias.
CCREPE answers the question: βWhich pairs of microbial features are truly correlated with each other, accounting for the compositional nature of microbiome data?β
- π GitHub
- π Bioconductor
- ποΈ Paper: Faust et al. 2012, PLOS Computational Biology
When to Use CCREPE
Use CCREPE when you want to:
- Identify positively or negatively co-occurring microbial taxa
- Build microbial co-occurrence networks
- Correlate microbial features with each other or with metabolites
- Correct for compositional bias in standard correlation analyses
Other tools for microbiome correlation analysis include SpiecEasi and flashWeave. CCREPEβs approach is based on NC (North Carolina) renormalization, which is especially suited to relative abundance data.
Installation
R / Bioconductor (recommended)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ccrepe")From GitHub
devtools::install_github("biobakery/ccrepe")Basic Usage
library(ccrepe)
# Load species abundance table
# Rows = samples, columns = species
abundance <- read.table("species_abundance.tsv",
sep = "\t",
header = TRUE,
row.names = 1)
# Run CCREPE with default settings (Spearman correlation)
ccrepe_results <- ccrepe(
x = abundance,
sim.score = cor, # similarity function
iterations = 1000, # number of permutations
min.subj = 10 # minimum samples for inclusion
)Cross-dataset correlation
# Correlate species abundance with metabolite abundance
metabolites <- read.table("metabolites.tsv",
sep = "\t",
header = TRUE,
row.names = 1)
ccrepe_cross <- ccrepe(
x = abundance,
y = metabolites, # second dataset
sim.score = cor,
iterations = 1000
)Output
CCREPE returns a list of matrices:
| Output | Description |
|---|---|
sim.score |
Correlation coefficient matrix |
p.values |
Raw p-values for each correlation |
q.values |
FDR-adjusted p-values (Benjamini-Hochberg) |
z.stat |
Z-statistics |
Accessing results
# Get significant correlations (q < 0.05)
sig_pairs <- which(ccrepe_results$q.values < 0.05, arr.ind = TRUE)
# Build a table of significant correlations
library(tibble)
sig_table <- tibble(
feature1 = rownames(ccrepe_results$q.values)[sig_pairs[, 1]],
feature2 = colnames(ccrepe_results$q.values)[sig_pairs[, 2]],
correlation = ccrepe_results$sim.score[sig_pairs],
q_value = ccrepe_results$q.values[sig_pairs]
)Custom Similarity Functions
CCREPE supports any custom similarity or distance function:
# Use a custom similarity metric (e.g., Mutual Information)
library(infotheo)
mi_score <- function(x, y) {
mutinformation(discretize(x), discretize(y))
}
ccrepe_mi <- ccrepe(
x = abundance,
sim.score = mi_score,
iterations = 1000
)Tips & Gotchas
Compositionality β Never use standard Pearson or Spearman correlations directly on relative abundance data. The compositional constraint (all features sum to 1) creates spurious negative correlations. CCREPE corrects for this.
More iterations = more accurate p-values β The default 1000 iterations is a minimum. Use 10,000+ for publication-quality results, at the cost of longer computation time.
Filter rare features β Remove taxa present in fewer than 10% of samples before running CCREPE to reduce multiple testing burden and improve statistical power.
Visualize results as a network β Use the igraph or ggraph R packages to visualize significant CCREPE correlations as a co-occurrence network.
Further Reading
- CCREPE Bioconductor vignette
- Faust et al. 2012, PLOS Computational Biology
- Faust & Raes 2012, Nature Reviews Microbiology (review of microbial co-occurrence networks)