baqlava
Bacteriophage and vIral Analysis of Large-scale metAgenomic data
What is baqlava?
baqlava (Bacteriophage and vIral Analysis of Large-scale metAgenomic data) is a computational pipeline for identifying and profiling viruses — particularly bacteriophages — from shotgun metagenomic data. It uses a database of viral marker genes to accurately detect and quantify viral populations in complex microbial communities.
baqlava answers the question: “Which viruses (especially bacteriophages) are present in this metagenome, and at what abundance?”
- 📄 GitHub
- 📖 Documentation
When to Use baqlava
Use baqlava when you want to:
- Profile the virome (viral community) from metagenomic data
- Detect bacteriophages alongside bacteria in the same dataset
- Study phage-bacteria interaction dynamics
- Integrate viral profiling with HUMAnN/MetaPhlAn results
baqlava is complementary to MetaPhlAn. While MetaPhlAn focuses on bacteria, archaea, and microbial eukaryotes, baqlava targets viruses and phages.
Installation
Via conda
conda create -n baqlava -c biobakery baqlava
conda activate baqlavaVia pip
pip install baqlavaDatabase setup
# Download viral marker database
baqlava_databases --download all /path/to/databases/Basic Usage
baqlava \
--input sample.fastq.gz \
--output output_directory/ \
--threads 8Key options
| Option | Description |
|---|---|
--input |
Input FASTQ file(s) |
--output |
Output directory |
--threads |
Number of CPU threads |
--bypass-host-removal |
Skip host read removal (if already done) |
Output Files
| File | Contents |
|---|---|
*_viral_abundance.tsv |
Viral species/family abundances (RPK) |
*_viral_profile.tsv |
Presence/absence profile of viral taxa |
Integration with MetaPhlAn
baqlava was designed to complement MetaPhlAn. When using MetaPhlAn with the --add_viruses flag, a subset of viruses can be profiled. baqlava provides more comprehensive viral profiling:
# MetaPhlAn with basic viral support
metaphlan sample.fastq.gz \
--add_viruses \
--input_type fastq \
-o sample_profile.txt
# baqlava for comprehensive viral profiling
baqlava \
--input sample.fastq.gz \
--output baqlava_output/Tips & Gotchas
Host removal first — Remove human (or other host) reads before running baqlava. Use KneadData for this.
Virome enrichment vs. total metagenomics — baqlava works on standard shotgun metagenomics, but viral sequences are often at low abundance. Virome-enriched protocols (e.g., VLP extraction) will give better sensitivity.
Combine with MetaPhlAn results — Join baqlava viral abundances with MetaPhlAn bacterial abundances for a comprehensive community profile.