Skip to main content


Preprint from Salzberg team questioning a 2020 Nature paper from Rob Knight 😮

"the raw read counts were vastly over-estimated for nearly every bacterial species, often by a factor of 1000 or more."

"Our conclusion after re-analysis is that the near-perfect association between microbes and cancer types reported in the study is, simply put, a fiction."

Major data analysis errors invalidate cancer microbiome findings

https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1

#microbiome #genomics #research #science

in reply to Feargal Ryan :rna:

The authors of the original paper have also published a rebuttal on their github. THE DRAMA!

https://github.com/gregpoore/tcga_rebuttal

in reply to Feargal Ryan :rna:

oh man. There's a lot to chew on in this but just read the section "Normalization of the reads erroneously created a distinct signature of each cancer" in the preprint. Many of the most important features in the classifier had 0 reads in all samples. That is reallyyy not good. The Github counter-rebuttal seems to be saying "But there's still a signal!" ignoring the huge flaws in the original...
This entry was edited (1 year ago)
in reply to Alex Crits-Christoph

@alexcc I’ll be interested to see how the peer review plays out. But boy oh boy if what this preprint is saying is accurate these are some rookie mistakes to make. Like first year student with zero supervision type stuff...
in reply to Feargal Ryan :rna:

@alexcc “The models included species that had never been reported in humans, and that were associated only with extreme environments, ocean-dwelling species, plants, or other non- human environments.” - I’ve called this out when reviewing manuscripts on multiple occasions! It’s so basic
in reply to Feargal Ryan :rna:

@alexcc great example of why carefully preprocessing and validating the inputs for these models is so important.

Lo, thar be cookies on this site to keep track of your login. By clicking 'okay', you are CONSENTING to this.