metawibele

Metagenomic-based Microbial Bioactive Elements

What is metawibele?

metawibele (Metagenomic-based Microbial Bioactive Elements) is a pipeline for characterizing and prioritizing novel protein families from metagenomes. It processes metagenomic assemblies, predicts open reading frames (ORFs), clusters them into protein families, and then annotates them against multiple functional databases to identify candidate bioactive proteins.

metawibele answers the question: β€œWhat novel or understudied microbial proteins are present in this metagenome, and which ones are worth investigating further?”


When to Use metawibele

Use metawibele when you want to:

  • Characterize microbial proteins from metagenomic assemblies
  • Identify novel proteins with unknown functions
  • Prioritize proteins for experimental follow-up
  • Annotate protein families against multiple databases simultaneously

Installation

From source

git clone https://github.com/biobakery/metawibele.git
cd metawibele
pip install .

Workflow Overview

metawibele has three main stages:

Metagenomic assemblies (contigs)
        β”‚
        β–Ό
  1. Preprocessing
     (ORF prediction, protein clustering)
        β”‚
        β–Ό
  2. Characterization
     (multi-database annotation)
        β”‚
        β–Ό
  3. Prioritization
     (scoring and ranking)

Running the full pipeline

metawibele \
  --input-sequence contigs.fasta \
  --input-count counts.tsv \
  --output-folder metawibele_output/ \
  --threads 8

Annotation Databases

metawibele integrates annotation from multiple sources:

Database Information
UniRef90 Protein family membership
Pfam Protein domain content
KEGG Metabolic pathway annotations
eggNOG Orthologous group annotations
PSORTb Protein subcellular localization
SignalP Signal peptide prediction
TMHMM Transmembrane domain prediction
MaAsLin2 Association with metadata

Output Files

File Contents
*_proteinfamilies.tsv Protein family abundance table
*_characterization.tsv Multi-database annotations
*_prioritization.tsv Ranked prioritization scores

Prioritization Score

metawibele computes a composite prioritization score based on:

  • Prevalence β€” How common is this protein family across samples?
  • Abundance β€” How highly abundant is it?
  • Annotation novelty β€” Is it unannotated or poorly characterized?
  • Differential abundance β€” Is it associated with a phenotype of interest?

Tips & Gotchas

Warning

Compute requirements β€” The characterization step involves running BLAST against several large databases. This can require significant disk space (>100 GB) and compute time.

Tip

Start with the demo dataset to understand the expected input/output formats before running on your own data.

Tip

metawibele integrates with MaAsLin2 for differential abundance testing as part of the prioritization step. Make sure your metadata file is formatted correctly.


Further Reading