Skip to main content


1) Want to know how much of your metagenome is eukaryotic? No references? No problem. We developed SingleM microbial fraction (SMF) and ran it on 250k metagenomes https://www.biorxiv.org/content/10.1101/2024.05.16.594470v1.

If you know what Eukaryotes are there, you can filter reads by mapping to their genomes. However, often you don’t know what’s in your sample, or the euk doesn’t have a genome.

#metagenomics #bioinformatics #genomics #microbiomes #microbialecology

This entry was edited (6 months ago)
in reply to Raphael Eisenhofer

2) Knowing how much of your metagenome is microbial is important, as success in genome-resolved metagenomics is measured by the % of reads mapping to your genome catalogue. If we didn't know the microbial fraction, samples 2&3 would be indistinguishable.

SMF works through a simple equation which only considers microbes:

# microbial base pairs = taxon abundances ✕ mean genome sizes.

To estimate abundances we used SingleM, which estimates the coverage of almost all microbes, even novel taxa

in reply to Raphael Eisenhofer

SMF performs well for simple A) and complex CAMI2 B) metagenomes (horizontal bars represent true microbial fraction). It also performs well for divergent genomes not in GTDB C).
in reply to Raphael Eisenhofer

4) Using publicly available human and hyena faecal data, we show that SMF can help determine how well metagenomes are represented by MAGs or reference genome catalogues. HRGM = Human Reference Gut Microbiome catalogue

We coin the term Domain-Adjusted Mapping Rate (DAMR), which is the genome mapping rate / SMF microbial fraction. E.g. if the mapping rate = 40% and the microbial fraction = 50%, then 40/50 = 80% DAMR. This can be used to estimate how well a sample is represented by MAGs/genomes.

in reply to Raphael Eisenhofer

5) To demonstrate how scalable SMF is, Ben Woodcroft ran it on all SRA metagenomes (~250,000). You can use Sandpiper https://sandpiper.qut.edu.au/ to browse the taxonomy and microbial fraction of these metagenomes!

We then assessed how well SMF performs compared to NCBI's STAT tool (which is based on k-mers drawn from reference genomes). SMF excels on sample types that are underrepresented in databases: non-human animals D), marine E), soil F).

in reply to Raphael Eisenhofer

We also noticed that SMF performed substantially better on human faecal samples sourced from underrepresented populations (Africa/South America) panel C).
in reply to Raphael Eisenhofer

6) We then show that microbial fractions themselves can be correlated with factors A) latitude for soil metagenomes, and demonstrate that microbial fractions can vary considerably both within and between sample types B).

To sum up, SMF can be used to rapidly estimate microbial fractions and genome sizes from metagenomes. This can be used to appraise the representativeness of de novo MAGs and genome catalogues, to assess bioinformatic/lab methods, and to identify poor-quality samples

in reply to Raphael Eisenhofer

7) There’s a lot more I didn’t mention, so feel free to check out the preprint. Huge thanks to Ben Woodcroft and Antton Alberdi, the @NCBI, @CMR_QUT and to the people who make their metagenomes and metadata publicly available!

Lo, thar be cookies on this site to keep track of your login. By clicking 'okay', you are CONSENTING to this.