Skip to main content


So @internetarchive scanning books for their digital library is copyright infringement:
http://blog.archive.org/2023/03/25/the-fight-continues/

But OpenAI slurping all of that to train a model that then can generate text and put actual authors out of business (already happening with copywriters), is not.

Figures, there are no $billions of VC / corporate money behind Internet Archive, why would anyone want to support a public service, right? 🤦‍♀️

IA ≠ AI, know the difference!

#InternetArchive
This entry was edited (1 year ago)
in reply to Bob Mottram :debian:

you are the best kind of correct: technically correct.

But we have been told for decades that unlicensed remixes are copyright infringement just as well as straight up distributing unlicensed copies.

I think all of this is bullcrap, but at least here I can point to very clear example of hypocrisy.
in reply to Michał "rysiek" Woźniak · 🇺🇦

Remixes are usually allowed provided that they don't include long sections of the original work, and I think something similar will apply with the language models. But it is all yet to be decided by future test cases.
in reply to Bob Mottram :debian:

This is an interesting one, data is just data, so I don't have much of an issue of people using data for this. BUT I do have a HUGE issue with them privatizing the results, this is not #4opens
in reply to vagabond

@Hamishcampbell I have a serious problem with hypocrisy of treating some forms of expression as "data", and some forms of expression as So Very Special You Can't Even Link To It.

And the hole AI thing is putting it in stark relief.

@bob @internetarchive

Lo, thar be cookies on this site to keep track of your login. By clicking 'okay', you are CONSENTING to this.