Meta used pirated books to train its AI models, and there are emails to prove it

Alfonso Maruccia

Posts: 1,800   +542
Staff
Facepalm: A group of authors has sued Meta, alleging that the company used unauthorized copies of their books to train its generative AI models. While Meta has denied any wrongdoing, newly unsealed messages suggest that executives and engineers were well aware of their actions – and that they were violating copyright law.

The lawsuit filed by Sarah Silverman, Richard Kadrey, and other writers and rights holders against Meta may be entering its most critical phase. The authors have obtained internal company emails in which Meta employees openly discussed "torrenting" well-known archives of pirated content to train more powerful AI models.

Meta previously acknowledged using certain controversial datasets, arguing that such practices should be considered fair use. The company also admitted to downloading a massive dataset known as "LibGen," which contains millions of pirated books. However, the newly unsealed emails reveal deeper concerns within Meta about acquiring and distributing this data through the BitTorrent network.

According to the emails, Meta downloaded and shared at least 81.7 terabytes of data across multiple contentious datasets, including 35.7 terabytes from Z-Library and LibGen archives. The plaintiffs allege that Meta engaged in an "astonishing" torrenting scheme, distributing pirated books at an unprecedented scale.

In an April 2023 message, Meta researcher Nikolay Bashlykov wrote, "torrenting from a corporate laptop doesn't feel right." The message ended with a smiling emoji, but a few months later, his tone shifted significantly.

In September 2023, Bashlykov stated that he was consulting Meta's legal team because using torrents – and thereby "seeding" terabytes of pirated data – was clearly "not OK" from a legal standpoint.

Meta was apparently aware that its engineers were engaging in illegal torrenting to train AI models, and Mark Zuckerberg himself was reportedly aware of LibGen. To conceal this activity, the company attempted to mask its torrenting and seeding by using servers outside of Facebook's main network. In another internal message, Meta employee Frank Zhang referred to this approach as "stealth mode."

Like other major tech firms, Meta is pouring massive amounts of money into AI development and generative AI services. The company, which aims to populate its aging social networks with AI-generated personas and bots, recently filed a motion to dismiss the lawsuit led by Silverman and other authors. However, the newly revealed emails detailing Meta's involvement in torrenting and distributing pirated books could significantly complicate its legal defense.

Permalink to story:

 
If before through the planned A.I regulations META was being scummy behind people's backs, imagine now with the Trump administration allowing all these goons do whatever they want.

There's no way any protection regulations would be put in place now.
 
Last edited:
If before through the planned A.I regulations META was being scummy behind people's backs, imagine now with the Trump administration allowing all these goons do whatever they want.

There's No way now any protection regulations would be in place.

Good grief, Mean Orange Man just got back in office, and this is all his fault already? Whew!
 
Copyright lacks legal depth. Every sane judge, if faced with the choice to cancel AI or cancel copyright, will choose to cancel copyright. The collective decisions of all judges will trigger a reform, resulting in a new law that cancels copyright.
 
Another oversized organization lying, cheating, stealing, and lobbying their way upwards. Having actual core values is expensive, but advertising false values as marketing decorum is not.

Looks like Facebook is in rough shape if they can't even afford their own training data. Will the legal reprocussions be more costly than paying for their content up front like law abiding citizens united? Probably not.

They've been hacked multiple times, they've stolen data, sold said stolen data, sold private information to public institutions, etc... And yet somehow they're considered a success.

Parasites.
 
Last edited:
Do you have to post your stupidity on every article?
Of course he does, because he's a troll. There's probably nobody to troll on Orange Twitter, since it has less than a million active users and it's basically all right-wingers on there. So he has to post his nonsense on sites like this one to make himself feel better.
 
I am not even so disgusted by theft.
I am about the possibility that AI can clearly make jobs obsolete.
It means, they are not only stealing, they are making sure
a lot more people will not have the ability to find a decent job
while Meta and its owner add billions to their account.
It is like an extra cruel thing to do.
It is literally the rich stealing from the poor.

I doubted that these companies would play fair and won't touch content that is not theirs, and said it many times before.
Well, here are some receipts. Also, here is a prediction, they will go further and unlike downloading a library archive, they will steal data from the people, they will siphon it from each other's platforms. There is nothing that can stop them from doing it.
 
The big tech playbook is break the law to get ahead and then fight it with lawyers afterwards. They have openly stated the illegal edge makes more than enough money to pay the lawyers and by the time it's settled in court many years later the tech in question is obsolete and doesn't matter. (and they are using the next illegal edge)
They need to keep stealing though for their thing to stay current. Perhaps, that is where some of them can be punished because they will keep doing it, till someone has enough money and time to bleed one of these parasites.
 
Comment repeated ad nauseum elsewhere, they were 100% leeching and doing minimal seeding - so even P. off the the pirates

Think Z library has huge amount of technical books, subject to a number of FBI takedowns - oh well more advertising for Z Library- check it has a limit for 10 a day , except for donation periods of 1000/day for a week. Bet they gave minimum donation for multiple accounts

These companies all scape, ignore do not scrape.

Makes you wonder how they remove lots of private servers info ( internal documents ) that should not be open to public, but poor security.
Didn't google clean this up to stop cache searches of company, govt private servers - could be wrong
 
Meanwhile, u get jail... huge fines and sued to hell if you pirate 1 movie/song :D :D
Fair is fair :) Rich=IZ OKAY!
 
Anyone thinking anyone working in AI is doing anything ethical or good is straight up stupid.

META Sold your data to China in 2017 and stupid people still think tiktok is a security threat due to china getting your data. Read that again to understand how stupid those people are.

America is cooked, stupid has won.

Its about money. Meta has investors from congress, it will get away with anything it wants to and there is nothing you can do about it.

Welcome to capitalism. Reap what you sow.
 
One of the scummiest social media companies on the planet is still scummy, nothing really changes.

Meta\Facebook has never had a moral compass and said and done whatever is necessary to make more money. Users be damned, privacy be damned, copyright be damned, advertisers be damned... literally zero fks given about anyone or anything but the almighty dollar and themselves.

 
Back
OSZAR »