MetaPhlAn
Metagenomic Phylogenetic Analysis
What is MetaPhlAn?
MetaPhlAn (Metagenomic Phylogenetic Analysis) is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. It uses a database of clade-specific marker genes to rapidly and accurately determine the taxonomic composition of a metagenome.
MetaPhlAn answers the question: “Who is in this community and at what abundance?”
Current version: MetaPhlAn 4
When to Use MetaPhlAn
Use MetaPhlAn when you have shotgun metagenomic reads and you want to:
- Determine which species/strains are present
- Get relative abundance estimates at multiple taxonomic levels
- Detect novel or uncharacterized species
- Feed taxonomic profiles downstream into HUMAnN or StrainPhlAn
MetaPhlAn works only with shotgun metagenomics data. For 16S amplicon data, use QIIME2 or DADA2 for taxonomic profiling.
Installation
Via conda (recommended)
conda create -n metaphlan -c biobakery metaphlan
conda activate metaphlanVia pip
pip install metaphlanDatabase
Download the marker gene database:
metaphlan --install --index mpa_vOct22_CHOCOPhlAnSGB_202212Basic Usage
metaphlan sample.fastq.gz \
--input_type fastq \
--nproc 8 \
-o sample_profile.txtPaired-end reads
metaphlan sample_R1.fastq.gz,sample_R2.fastq.gz \
--input_type fastq \
--nproc 8 \
-o sample_profile.txtKey options
| Option | Description |
|---|---|
--input_type |
Input type: fastq, fasta, bowtie2out, sam |
--nproc |
Number of CPU threads |
-o |
Output profile file |
--bowtie2out |
Save Bowtie2 output for reuse |
--tax_lev |
Taxonomic level to profile (a=all, k=kingdom, p=phylum, …) |
--add_viruses |
Include viral organisms |
--unknown_estimation |
Estimate fraction of unknown organisms |
Output Files
The main output is a tab-separated profile file with columns:
| Column | Description |
|---|---|
#clade_name |
Full taxonomic lineage string |
NCBI_tax_id |
NCBI taxonomy identifier |
relative_abundance |
Relative abundance (0–100%) |
additional_species |
Species grouped under this clade |
Merging multiple samples
merge_metaphlan_tables.py \
sample1_profile.txt \
sample2_profile.txt \
sample3_profile.txt \
-o merged_profiles.txtTaxonomic Levels
MetaPhlAn profiles at all standard taxonomic levels:
| Prefix | Level |
|---|---|
k__ |
Kingdom |
p__ |
Phylum |
c__ |
Class |
o__ |
Order |
f__ |
Family |
g__ |
Genus |
s__ |
Species |
t__ |
Strain/SGB |
Tips & Gotchas
Reuse Bowtie2 alignments — Save the --bowtie2out file and reuse it with --input_type bowtie2out if you need to re-profile without re-aligning.
Human/host reads should be removed before running MetaPhlAn. Use KneadData (also from Biobakery) for host decontamination.
MetaPhlAn4 introduced SGBs (Species-level Genome Bins), which allows profiling of novel uncharacterized species not in reference databases.