1) Want to know how much of your metagenome is eukaryotic? No references? No problem. We developed SingleM microbial fraction (SMF) and ran it on 250k metagenomes https://www.biorxiv.org/content/10.1101/2024.05.16.594470v1.
If you know what Eukaryotes are there, you can filter reads by mapping to their genomes. However, often you don’t know what’s in your sample, or the euk doesn’t have a genome.
#metagenomics #bioinformatics #genomics #microbiomes #microbialecology
Large-scale estimation of bacterial and archaeal DNA prevalence in metagenomes reveals biome-specific patterns
Metagenomes often contain many reads derived from eukaryotes. However, there is usually no reliable method for estimating the prevalence of non-microbial reads in a metagenome, forcing many analysis techniques to make the often-faulty assumption that…bioRxiv
This entry was edited (6 months ago)
Raphael Eisenhofer
in reply to Raphael Eisenhofer • • •2) Knowing how much of your metagenome is microbial is important, as success in genome-resolved metagenomics is measured by the % of reads mapping to your genome catalogue. If we didn't know the microbial fraction, samples 2&3 would be indistinguishable.
SMF works through a simple equation which only considers microbes:
# microbial base pairs = taxon abundances ✕ mean genome sizes.
To estimate abundances we used SingleM, which estimates the coverage of almost all microbes, even novel taxa
Raphael Eisenhofer
in reply to Raphael Eisenhofer • • •Raphael Eisenhofer
in reply to Raphael Eisenhofer • • •4) Using publicly available human and hyena faecal data, we show that SMF can help determine how well metagenomes are represented by MAGs or reference genome catalogues. HRGM = Human Reference Gut Microbiome catalogue
We coin the term Domain-Adjusted Mapping Rate (DAMR), which is the genome mapping rate / SMF microbial fraction. E.g. if the mapping rate = 40% and the microbial fraction = 50%, then 40/50 = 80% DAMR. This can be used to estimate how well a sample is represented by MAGs/genomes.
Raphael Eisenhofer
in reply to Raphael Eisenhofer • • •5) To demonstrate how scalable SMF is, Ben Woodcroft ran it on all SRA metagenomes (~250,000). You can use Sandpiper https://sandpiper.qut.edu.au/ to browse the taxonomy and microbial fraction of these metagenomes!
We then assessed how well SMF performs compared to NCBI's STAT tool (which is based on k-mers drawn from reference genomes). SMF excels on sample types that are underrepresented in databases: non-human animals D), marine E), soil F).
sandpiper
sandpiper.qut.edu.auRaphael Eisenhofer
in reply to Raphael Eisenhofer • • •Raphael Eisenhofer
in reply to Raphael Eisenhofer • • •6) We then show that microbial fractions themselves can be correlated with factors A) latitude for soil metagenomes, and demonstrate that microbial fractions can vary considerably both within and between sample types B).
To sum up, SMF can be used to rapidly estimate microbial fractions and genome sizes from metagenomes. This can be used to appraise the representativeness of de novo MAGs and genome catalogues, to assess bioinformatic/lab methods, and to identify poor-quality samples
Raphael Eisenhofer
in reply to Raphael Eisenhofer • • •Frank Aylward
in reply to Raphael Eisenhofer • • •