PICRUSt2
Phylogenetic Investigation of Communities by Reconstruction of Unobserved States
What is PICRUSt2?
PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States 2) predicts the functional composition of a metagenome using 16S rRNA amplicon data. It places your 16S sequences on a reference phylogenetic tree and then predicts gene family and pathway abundances based on characterized genomes.
PICRUSt2 answers the question: βWhat metabolic functions might this community be performing?β β from 16S amplicon data alone.
- π GitHub
- π Documentation
- ποΈ Paper: Douglas et al. 2020, Nature Biotechnology
When to Use PICRUSt2
Use PICRUSt2 when you have 16S rRNA amplicon data (not shotgun metagenomics) and want functional predictions:
- You donβt have shotgun metagenomic data
- You want a quick functional annotation from existing 16S surveys
- Budget or compute constraints prevent shotgun sequencing
PICRUSt2 predictions are inferred, not measured. For accurate functional profiling, use HUMAnN with actual shotgun metagenomics data.
Installation
Via conda (recommended)
conda create -n picrust2 -c bioconda -c conda-forge picrust2
conda activate picrust2Via pip
pip install picrust2Basic Usage
PICRUSt2 requires as input: 1. A table of representative 16S sequences (FASTA) 2. An ASV/OTU abundance table (biom format)
picrust2_pipeline.py \
-s seqs.fna \
-i feature-table.biom \
-o picrust2_out_pipeline \
-p 4Step-by-step workflow
# Step 1: Place sequences on reference tree
place_seqs.py \
-s seqs.fna \
-o out.tre \
-p 4
# Step 2: Predict hidden state (gene family copy numbers)
hsp.py -i 16S -t out.tre -o 16S_predicted.tsv.gz -p 4
hsp.py -i EC -t out.tre -o EC_predicted.tsv.gz -p 4
# Step 3: Compute metagenome abundances
metagenome_pipeline.py \
-i feature-table.biom \
-m 16S_predicted.tsv.gz \
-f EC_predicted.tsv.gz \
-o EC_metagenome_out
# Step 4: Predict pathway abundances
pathway_pipeline.py \
-i EC_metagenome_out/pred_metagenome_contrib.tsv.gz \
-o pathways_out \
-p 4Output Files
| File | Contents |
|---|---|
EC_predicted.tsv.gz |
Predicted EC (enzyme) copy numbers per ASV |
EC_metagenome_out/pred_metagenome_unstrat.tsv.gz |
Community-level EC abundances |
pathways_out/path_abun_unstrat.tsv.gz |
MetaCyc pathway abundances |
pathways_out/path_abun_strat.tsv.gz |
Pathway abundances stratified by contributing taxon |
Tips & Gotchas
NSTI scores β PICRUSt2 reports the Nearest Sequenced Taxon Index (NSTI) for each ASV. High NSTI values (>2) indicate your sequences are phylogenetically distant from reference genomes, making predictions less reliable. Filter out high-NSTI ASVs for better results.
Use DADA2 or DEBLUR output as input to PICRUSt2 β exact sequence variants work better than OTUs clustered at 97% similarity.
Add descriptions to output files using add_descriptions.py:
add_descriptions.py -i pathways_out/path_abun_unstrat.tsv.gz -m METACYC -o path_abun_unstrat_descrip.tsv.gz