PhyloPhlAn

High-resolution Microbial Phylogenetics

What is PhyloPhlAn?

PhyloPhlAn is a tool for high-resolution microbial phylogeny reconstruction, genome characterization, and taxonomic assignment of microbial genomes. It uses a large database of universal single-copy marker proteins to build comprehensive and accurate phylogenetic trees from whole genomes or metagenome-assembled genomes (MAGs).

PhyloPhlAn answers the question: β€œWhere does this genome fit on the tree of life?”

Current version: PhyloPhlAn 3


When to Use PhyloPhlAn

Use PhyloPhlAn when you want to:

  • Place new genomes or MAGs on a reference phylogenetic tree
  • Characterize the taxonomy of novel organisms
  • Build high-quality, large-scale phylogenetic trees
  • Assign taxonomy to unannotated genomes

Installation

Via pip

pip install phylophlan

External dependencies

PhyloPhlAn requires: - mash β€” for initial genome sketching and clustering - muscle or mafft β€” for multiple sequence alignment - trimal β€” for alignment trimming - raxml or iqtree β€” for phylogenetic tree inference

Install all at once via conda:

conda install -c bioconda mash muscle mafft trimal raxml iqtree

Basic Usage

Database setup

# List available databases
phylophlan_databases --help

# Download a database (e.g., for genome phylogeny)
phylophlan_setup_database \
  -d phylophlan \
  --database_folder databases/

Phylogenetic tree construction

phylophlan \
  -i genome_folder/ \
  -d phylophlan \
  --databases_folder databases/ \
  -o output_tree/ \
  --diversity medium \
  --nproc 8 \
  -f supermatrix_aa.cfg

Metagenomic strain tracking

phylophlan_metagenomic \
  -i mag_folder/ \
  -d SGB.Jan19 \
  --databases_folder databases/ \
  -o output_sgbs/ \
  --nproc 8

Key Modes

Mode Command Use case
Genome phylogeny phylophlan Build trees from genomes
MAG classification phylophlan_metagenomic Assign MAGs to SGBs
Database creation phylophlan_setup_database Build custom databases

Output Files

File Contents
*.tre Newick-format phylogenetic tree
*.xml PhyloXML-format tree
*_refined.tre Tree after outlier removal

Tips & Gotchas

Tip

Choose the right diversity setting β€” Use --diversity low for closely related strains, medium for species-level analyses, and high for genus/family-level trees.

Warning

Input genome quality matters β€” Highly fragmented or low-quality genomes (e.g., MAGs with <50% completeness) may produce poorly placed branches. Filter by completeness using CheckM first.

Tip

PhyloPhlAn integrates with MetaPhlAn β€” MetaPhlAn’s SGB (Species-level Genome Bins) taxonomy is based on PhyloPhlAn trees.


Further Reading