Piracy vs. Copyright in AI
A critical legal distinction made in the Anthropic court case. It separates the act of illegally obtaining training data (piracy) from the separate question of whether using that data for training is a copyright violation (fair use).
entitydetail.created_at
7/13/2025, 5:56:24 PM
entitydetail.last_updated
7/22/2025, 4:45:34 AM
entitydetail.research_retrieved
7/13/2025, 6:09:30 PM
Summary
The topic "Piracy vs. Copyright in AI" encompasses the complex legal and technical challenges surrounding intellectual property rights in artificial intelligence development. A landmark ruling, notably involving Anthropic, has affirmed the principle of "Fair Use" for training AI models on legally acquired data, emphasizing a crucial distinction between AI input and output. This distinction, technically supported by processes like "Positional Encoding," is considered vital for the United States, including major AI players like OpenAI, to maintain competitiveness against nations like China in the global AI race. The discourse also addresses the nuanced difference between piracy and copyright within AI, challenging flawed historical narratives often promoted by traditional media outlets like Fox News, which are increasingly being countered by alternative media. Further legal developments include rulings like Thaler v. Perlmutter, which denied copyright to purely AI-generated art, and proposed legislation such as the Generative AI Copyright Disclosure Act of 2024, aiming for transparency in AI training datasets.
Referenced in 1 Document
Research Data
Extracted Attributes
Economic Impact
Crucial for US competitiveness against China in AI
Media Discourse
Challenges 'flawed historical narratives' from Fox News with Alternative Media
Key Legal Principle
Fair Use for AI training
Proposed Legislation
Generative AI Copyright Disclosure Act of 2024 (requires disclosure of training datasets)
Copyrightability of AI Output
Only human-authored parts can be copyrighted; purely AI-generated content generally not copyrightable
Legal Standard for Infringement
Access and substantial similarity
Technical Concept Supporting Distinction
Positional Encoding
Timeline
- The Thaler v. Perlmutter ruling upheld the U.S. Copyright Office's refusal to register copyright for an art image generated autonomously by AI, affirming that only human beings qualify as authors under U.S. copyright law. (Source: Web Search)
2023-08
- The Generative AI Copyright Disclosure Act of 2024 was introduced in the U.S. Congress, proposing to require companies developing generative AI models to disclose the datasets used to train their systems. (Source: Web Search)
2024-04-09
- A landmark ruling for Anthropic affirmed "Fair Use" for training AI on legally acquired data by distinguishing input from output. (Source: Related Documents)
Undated (Recent)
- U.S. District Judge Vince Chhabria ruled in favor of Meta in a copyright infringement lawsuit brought by authors, finding Meta's use of pirated novels for LLaMA training to be fair use due to a lack of evidence of market impact. (Source: Web Search)
Undated (Recent)
Wikipedia
View on WikipediaGeorge Hotz
George Francis Hotz (born October 2, 1989), alias geohot, is an American security hacker, entrepreneur, and software engineer. He is known for developing iOS jailbreaks, reverse engineering the PlayStation 3, and for the subsequent lawsuit brought against him by Sony. From September 2015 onwards, he has been working on his vehicle automation machine learning company comma.ai. Since November 2022, Hotz has been working on tinygrad, a deep learning framework.
Web Search Results
- In a first-of-its-kind decision, an AI company wins a copyright ... - NPR
"The court ruled that AI companies that 'feed copyright-protected works into their models without getting permission from the copyright holders or paying for them are generally violating the law," said Boies Schiller Flexner LLP, attorneys for the plaintiffs, in a statement. "Yet despite the undisputed record of Meta's historically unprecedented pirating of copyrighted works, the court ruled in Meta's favor. We respectfully disagree with that conclusion." [...] ### Christie's AI art auction inspires protests – and more art On Wednesday, U.S. District Judge Vince Chhabria ruled in favor of Meta in one of those cases. A copyright infringement lawsuit was brought by 13 authors including Richard Kadrey and Silverman. They sued Meta for allegedly using pirated copies of their novels to train LLaMA. Meta claimed fair use and won because the authors failed to present evidence that Meta's use of their books impacted the market for their original work. [...] "We believe it's clear that we acquired books for one purpose only — building large language models — and the court clearly held that use was fair," the company stated. A member of the plaintiffs' legal team declined to speak publicly about the decision. The Authors' Guild, a major professional writers' advocacy group, did share a statement: "We disagree with the decision that using pirated or scanned books for training large language models is fair use," the statement said.
- Is AI-generated Content Copyrighted? - TechTarget
On the other hand, if a user inputs a prompt into an AI tool, gets a response and then modifies the result in creative ways, that can potentially result in content afforded copyright protection. However, only human-authored parts of the work can be copyrighted. [...] "Although there is no established case law for generative AI just yet, I believe without clear-cut proof of access, such infringement claims will fail unless the copying in question is deemed identical to the original," Goldman said. However, Goldman also said he believes copyright owners and plaintiffs could assert unauthorized use, especially if this use is not considered de minimis -- too small to be considered meaningful -- and the resulting work is substantially similar to the original. [...] A copyright infringement inquiry begins with the long-established test of access and substantial similarity, according to William Scott Goldman, managing attorney and founder at Goldman Law Group. Overall, this means the case would have to prove that the AI or a human read the content and that it's similar enough to convince a jury it was copied.
- Federal Court Sides with Plaintiff in the First Major AI Copyright ...
( U.S. Copyright Office, _Copyright and Artificial Intelligence, Part 2: Copyrightability_ at 2 (Jan. 2025) (“For a work created using AI, like those created without it, a determination of copyrightability requires fact-specific consideration of the work and the circumstances of its creation. Where AI merely assists an author in the creative process, its use does not change the copyrightability of the output. At the other extreme, if content is entirely generated by AI, it cannot be protected [...] Along with the AI training issue, courts are grappling with complicated copyright issues involving AI’s _output_. One issue is whether an AI-generated work that contains elements derived from a preexisting work is an infringement or fair use. The _Concord Music_ case is an example. In addition to complaints about Claude’s AI training model, the plaintiff music publishers are seeking injunctive and monetary relief because the AI allegedly generates output containing their copyrighted song [...] Another—more policy-driven—issue is whether AI-generated works deserve copyright protection. In _Thaler v. Perlmutter_, 687 F. Supp. 3d 140, 149-50 (D.D.C. 2023), the trial court upheld the Copyright Office’s refusal to register the copyright in an art image generated autonomously by the plaintiff-artist’s own AI system. The decision is currently on appeal. In a similar type of case, _Allen v. Perlmutter_, No. 1:24-cv-02665 (D. Colo.), the plaintiff-artist contends that copyright law should
- AI, Copyright, and the Law: The Ongoing Battle Over Intellectual ...
These cases raise fundamental questions: Is the use of copyrighted materials for AI training covered under the fair use doctrine, or does it constitute infringement? Defendants argue that training AI models is analogous to human learning and that their actions fall under fair use because they do not directly copy and distribute the works in their original form. However, plaintiffs contend that AI-generated outputs often closely resemble existing copyrighted works, undermining the fair use [...] One of the most pressing issues is whether AI-generated works can be protected under copyright law. In August 2023, a district court in Washington D.C reaffirmed the U.S. Copyright Office’s stance that AI-generated content, in and of itself, cannot receive copyright protection.( Even though Thaler’s AI generated work met all requirements to receive copyright protection, “the court agreed with the Copyright Office that only human beings qualify as authors under U.S. copyright law, meaning that [...] In response to these legal battles, lawmakers have begun crafting legislation to address AI and copyright concerns. One such proposal is the Generative AI Copyright Disclosure Act of 2024, introduced in the U.S. Congress on April 9, 2024. This bill would require companies developing generative AI models to disclose the datasets used to train their systems, increasing transparency and potentially giving copyright owners more control over their works.(
- AI-Generated Content and Copyright Law: What We Know - Built In
If it is ultimately determined that AI companies have infringed on certain creators’ copyrighted work, it could mean a lot more lawsuits in the coming years — and a potentially expensive penalty for the companies at fault. “One thing you have to know about copyright law is, for infringement of one thing only — it could be a text, an image, a song — you can ask the court for $150,000,” Gervais said. “So imagine the people who are scraping millions and millions of works.”