The Washington Post peeked inside the training data set used for LLMs from Facebook and Google and found Russian propaganda sites, white supremacist sites, extremist Christian sites, anti-trans sites, etc. https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/
See the websites that make AI bots like ChatGPT sound so smart
An analysis of a chatbot data set by The Washington Post reveals the proprietary, personal, and often offensive websites that go into an AI’s training data.Kevin Schaul, Szu Yu Chen, Nitasha Tiku (The Washington Post)
Alex Chaffee
in reply to kottke.org • • •@damemagazine
@emilymbender
you mean that a #GPT #LLM data set is not “the entire Internet” ‽‽‽ and humans selected and curated the sources to reflect their own personal human idiosyncratic subjective concept of an objective neutral view from nowhere‽‽‽ and credulous media reports are perpetuating false hype‽‽‽‽‽‽
🫢/s
https://dair-community.social/@emilymbender/110225819791678314