StrainPhlAn
Strain-level Metagenomic Profiling
What is StrainPhlAn?
StrainPhlAn is a tool for strain-level profiling and tracking of specific microbial species across metagenomic samples. It reconstructs strain-level sequences directly from shotgun metagenomic data and uses phylogenetic analysis to track strains across individuals, time points, or conditions.
StrainPhlAn answers the question: βAre the same strains present across these samples?β
StrainPhlAn is distributed as part of the MetaPhlAn package.
- π GitHub
- π Wiki
- ποΈ Paper: Truong et al. 2017, Nature Genetics
When to Use StrainPhlAn
Use StrainPhlAn when you want to:
- Track whether the same microbial strain is shared between individuals (e.g., mother-infant transmission)
- Monitor strain dynamics over time in longitudinal studies
- Investigate within-species genetic variation
- Identify the source of a specific strain
Installation
StrainPhlAn is bundled with MetaPhlAn:
conda create -n metaphlan -c biobakery metaphlan
conda activate metaphlanWorkflow
StrainPhlAn runs in two main steps after MetaPhlAn profiling:
Step 1: Sample-to-marker extraction
Extract consensus marker sequences from each sampleβs MetaPhlAn alignment:
# Run MetaPhlAn saving the SAM alignment
metaphlan sample.fastq.gz \
--input_type fastq \
-s sample.sam.bz2 \
-o sample_profile.txt \
--nproc 8
# Extract sample markers
sample2markers.py \
-i sample.sam.bz2 \
-r marker_db/ \
-o sample_markers/ \
-n 8Step 2: StrainPhlAn tree construction
# Get species-specific markers
extract_markers.py \
-c s__Eggerthella_lenta \
-o marker_dir/
# Build the phylogenetic tree
strainphlan \
-s sample_markers/*.pkl \
-m marker_dir/s__Eggerthella_lenta.fna \
-o strainphlan_output/ \
-n 8 \
-c s__Eggerthella_lentaOutput Files
| File | Contents |
|---|---|
*.tre |
Phylogenetic tree of strain relationships |
*.aln |
Aligned marker sequences |
*.info |
Run statistics and sample information |
Tips & Gotchas
Minimum coverage β StrainPhlAn requires sufficient coverage of the target species in each sample. Samples with very low abundance of the target species will be excluded. The default minimum coverage is 20%.
Save SAM files from MetaPhlAn by always using the -s flag. Once deleted, you cannot re-extract strain markers without re-running MetaPhlAn.
Add reference genomes to the tree using the --references option. This allows you to place metagenomic strains relative to known isolates.
strainphlan \
-s sample_markers/*.pkl \
-m marker_dir/s__Lactobacillus_reuteri.fna \
-r reference_genomes/*.fna \
-o strainphlan_output/ \
-c s__Lactobacillus_reuteri