Fair Use for AI Training: What Three 2025-2026 District Court Rulings Tell Us

Profile summary

Primary use cases: legal research, AI model training compliance
Pricing tier: enterprise/custom
Target audience: law firm
Last reviewed: 2026-07-04

By mid-2026, U.S. law on fair use for AI model training is no longer a blank page, but it is not a rulebook either. Three district courts have reached merits-stage fair use determinations in AI training cases: Thomson Reuters v. Ross Intelligence in the District of Delaware, decided in February 2025; Bartz v. Anthropic in the Northern District of California, decided in June 2025; and Kadrey v. Meta, also in the Northern District of California, decided in June 2025.[1][2] Two rulings favored AI defendants on the records before those courts. One did not. None binds a circuit court, and none can fairly carry the weight now being placed on it in market commentary.

Three judicial scales connected by digital data streams, representing diverging district court rulings on AI training fair use

The better question is not whether “the courts” have held that training AI systems on copyrighted works is fair use. They have not, in any nationally settled sense. The useful question is narrower: what can be inferred from these three district-level rulings before the Third Circuit, the Ninth Circuit, or both decide how much of this reasoning survives appellate review?

Case	Court and Date	Procedural Posture	Fair Use Outcome	Important Limitation
Thomson Reuters v. Ross Intelligence	D. Del., February 2025	Summary judgment-stage fair use ruling	Not fair use on the record presented	The court distinguished Ross’s non-generative legal research tool from generative AI systems.
Bartz v. Anthropic	N.D. Cal., June 2025	Summary judgment-stage fair use ruling	Fair use for training on lawfully acquired works	The court treated pirated-source copies separately from lawful acquisition.
Kadrey v. Meta	N.D. Cal., June 2025	Summary judgment-stage fair use ruling	Fair use on the record presented	The court signaled that a better-developed market-dilution record could matter.

What Each Court Actually Held

Thomson Reuters is the outlier in outcome and the easiest ruling to overread. The District of Delaware rejected Ross Intelligence’s fair use defense in litigation over Westlaw headnotes and related legal research content, but the opinion did not decide whether training a ChatGPT-style generative model on books, news, images, or legal materials is fair use.[1][2] The court’s analysis concerned a non-generative legal research product and a record in which the alleged use competed more directly with the plaintiff’s legal research offering.

That distinction matters. A court can reject fair use for a tool built to answer legal research queries from copied Westlaw material without deciding the harder question of whether a large language model’s ingestion of text for statistical training is transformative. Thomson Reuters therefore remains important, especially for legal content vendors and legal AI products, but it is not clean authority for the proposition that AI training is generally infringing.

Bartz went the other way on a different record. Judge William Alsup held that Anthropic’s training use of lawfully acquired books was fair use, treating the training process as highly transformative and rejecting a market-harm theory that would effectively make every uncompensated training use actionable merely because a license could have been demanded.[1] The ruling, however, separated training on lawfully acquired materials from the use of pirated-source datasets. That separation is doing real work. It prevents Bartz from becoming a blanket endorsement of shadow-library training pipelines.

Kadrey, decided in the same district and the same month, also found fair use for Meta on the summary judgment record, but its tone on market harm was materially less defendant-protective.[1] Judge Vince Chhabria concluded that the plaintiffs had not developed the right record, while emphasizing that a market-dilution theory may be viable where training enables AI systems to generate outputs that substitute for, or depress demand for, human-authored works.[1] That is not a win-loss footnote. It is the part of Kadrey most likely to reappear in appellate briefing.

Factor One: Transformative Use Is Doing Different Work in Each Opinion

Factor one is where the cleanest pro-training arguments usually begin, but the three opinions do not treat transformativeness as a single doctrinal switch. In Thomson Reuters, the court was not persuaded that Ross’s use transformed the copied legal content in a way that overcame the commercial and competitive character of the product.[1][2] The copied material was tied to a legal research function close enough to Thomson Reuters’ market to make the defendant’s purpose look less like broad computational analysis and more like product substitution.

Bartz is the strongest district-court statement for the defense-side view. The court accepted that using books to train a model was not the same purpose as reading or selling those books to consumers.[1] On that record, the model was not being offered as a library of the plaintiffs’ books, and the training use was treated as a new technical use of the works rather than a repackaged copy.

Kadrey reached a similar bottom-line result on factor one, but it did not do so with the same comfort about the broader economic consequences.[1] The opinion leaves room for a case in which the technical transformation involved in model training is still relevant but does not end the analysis, especially if plaintiffs can connect training to a cognizable downstream market effect.

That distinction lines up with the U.S. Copyright Office’s Part 3 report, released in pre-publication form in May 2025, which rejected the simple analogy that AI training is equivalent to human learning.[3] The report did not say training can never be transformative. It rejected the stronger claim that training is inherently transformative merely because a model extracts patterns rather than stores works for ordinary consumption.[3] That narrower move matters because it leaves courts with a record-specific inquiry rather than a categorical answer.

Factor Two: Copyrighted Expression Matters, but It Rarely Carries the Case

Factor two receives less practical weight in these rulings than factor one or factor four. The works at issue include expressive materials, legal editorial content, and books—materials that can support plaintiffs on the nature-of-the-work factor. But none of the three decisions turns primarily on factor two.[1][2]

That is not unusual in fair use litigation. Where a court sees the defendant’s purpose as meaningfully transformative, factor two tends to recede. Where a court sees competitive substitution or market capture, factor two rarely has to do the heavy lifting. In these AI training cases, the serious contest is not whether the inputs contain protected expression. It is what legal significance follows from copying that expression into a training process.

Factor Three: Copying Everything Is Not the Same Issue as Acquiring It Lawfully

Factor three forces a distinction that gets blurred in public debate. Large-scale training often requires copying entire works, but the opinions ask two related questions: whether copying the whole work was reasonably tied to the asserted training purpose, and whether the copies entered the training pipeline through lawful or unlawful acquisition.

Bartz is especially important on this point because it accepted full-work copying for training when the works were lawfully acquired, while treating pirated-source materials as a separate problem.[1] That is a narrower holding than “copying entire books for AI training is fair use.” It is closer to: on this record, full-copy ingestion of lawfully acquired books for model training did not defeat fair use.

The distinction is not merely formal. A defendant that buys books, scans them, and retains copies for training presents a different record from a defendant that relies on shadow-library datasets. Both records may involve full-work copying. They do not present the same equities, evidentiary posture, or statutory-damages exposure.

The later Bartz settlement underscores the risk signal without creating precedent. Anthropic agreed to a $1.5 billion settlement, described as approximately $3,000 per work, in connection with claims involving pirated books.[4] A settlement does not establish liability, and it does not answer the fair use question for other defendants. But it does show why dataset provenance is likely to become a practical dividing line in AI copyright litigation, even before appellate doctrine catches up.

Factor Four Is the Center of the Map

The most important unresolved split is factor four: the effect of the use on the potential market for or value of the copyrighted work. This is where the three rulings become least compatible as market guidance.

In Thomson Reuters, the market story was comparatively concrete. Ross’s tool operated in the legal research space, and the court credited market-substitution concerns in a setting where the plaintiff already sold legal research products.[1][2] That posture made factor four easier for the plaintiff than in cases involving general-purpose generative models trained on books.

Bartz is the more aggressive defense ruling. Judge Alsup rejected the idea that copyright owners suffer cognizable market harm simply because they could have licensed their books for training.[1] If that were enough, many fair uses would collapse into a right to charge for any socially or commercially valuable use. The opinion is at its strongest when it resists circularity: a plaintiff cannot define the relevant market only as the market to license the very use being challenged, then claim harm because the defendant did not pay.

Kadrey complicates that defense-side comfort. Judge Chhabria found the plaintiffs’ market-harm evidence insufficient, but he expressly treated market dilution as a serious theory.[1] The point is not just that an AI model might output infringing text. The sharper theory is that widespread training on copyrighted books may enable systems that generate competing works at scale, reducing the economic value of human-authored works even when no output is substantially similar to a particular plaintiff’s book.[1]

That theory is difficult to prove, but Kadrey suggests it is not doctrinally frivolous. The Copyright Office’s Part 3 report also gave meaningful attention to market dilution and rejected the view that training should be insulated merely because humans also learn from prior works.[3] For plaintiffs, that is an invitation to build a better record. For defendants, it is a warning that winning on an underdeveloped factor-four record is not the same as defeating the theory.

The Licensing Market Is Evidence, Not a Shortcut

The growth of voluntary licensing makes factor four harder to keep abstract. Reported deals involving companies such as Disney, OpenAI, HarperCollins, Microsoft, and news publishers are being used to show that an AI training licensing market is no longer hypothetical.[5] That evidence can help copyright owners argue that uncompensated training impairs a real or developing market.

But licensing evidence does not automatically decide fair use. A court still has to ask whether the asserted market is one copyright law should recognize for factor-four purposes, whether the defendant’s use substitutes for that market, and whether recognizing the market would make fair use circular. Bartz leaned hard against circularity. Kadrey left more room for a market-dilution record. Thomson Reuters had a more direct competitive market. Those are three different evidentiary structures, not three applications of a single settled rule.

The Same District, the Same Month, and No Stable Rule

The Bartz-Kadrey tension is more revealing than the simple fact that both defendants won. Both cases came from the Northern District of California in June 2025, under the same Ninth Circuit umbrella.[1] Both involved generative AI training claims brought by authors. Both reached fair use on the summary judgment records before them. Yet they differ materially in how much oxygen they give to market dilution.

That is exactly the kind of intra-district divergence that makes appellate review useful. If the Ninth Circuit takes up Bartz, Kadrey, or both, it will have to decide whether training’s technical transformation dominates the analysis, whether market dilution is legally cognizable, and what evidentiary showing plaintiffs need before the theory can survive summary judgment.

Thomson Reuters may reach appellate review first in a different posture. The case is on interlocutory appeal to the Third Circuit, and practitioner updates in 2026 identify that appeal as a major pending development for AI-related fair use doctrine.[2] Even there, however, the Third Circuit could decide the case narrowly because the underlying product was not a generative AI model. A narrow affirmance would matter for legal research tools and copied editorial content, but it would not settle book-based LLM training.

Pirated Datasets Remain Their Own Problem

One of the least helpful phrases in this area is “AI training data.” It sounds like a single category. The cases are already showing that source provenance may matter as much as model function.

A lawfully purchased or licensed corpus, a scraped website, a digitized internal archive, and a shadow-library dataset may all become training data. They do not carry the same legal risk. Bartz’s fair use analysis for lawfully acquired training materials should not be casually extended to pirated copies, particularly after the settlement signal in that litigation.[1][4]

This is also where fair use doctrine and litigation economics begin to diverge. A defendant may believe it has a strong transformative-use argument and still face severe exposure if the record includes unauthorized copies from known pirate sources. The unresolved question is whether lawful acquisition becomes a doctrinal dividing line, an equitable pressure point, or simply a damages accelerant. The district court opinions do not yet give a stable answer.

What the Next Twelve Months May Clarify

The appellate and trial calendar matters because district-court fair use wins can be fragile. Thomson Reuters is already positioned for Third Circuit interlocutory review.[2] Bartz and Kadrey are natural candidates for Ninth Circuit attention because they expose different approaches to market harm within the same district and the same general category of generative AI training litigation.[1]

Other pending cases may add pressure before the appellate courts speak. The OpenAI multidistrict litigation is expected to reach summary judgment in August 2026, and Andersen v. Stability AI is set for trial in September 2026.[6] Those proceedings may produce new records on training data, outputs, licensing markets, and harm theories. They may also reveal whether the current split is mostly about doctrine or mostly about evidentiary development.

For now, the emerging patterns are usable but not settled. Courts are willing to treat at least some AI training as transformative. Courts are not uniformly willing to accept training-license markets as cognizable factor-four harm. Courts are beginning to separate lawful acquisition from pirated-source datasets. None of those statements is yet a national rule.

The watchlist is correspondingly narrow: whether appellate courts accept model training as transformative and on what terms; whether they recognize licensing, substitution, or dilution markets for factor four; and whether lawful acquisition becomes the practical line between defensible training and high-exposure copying. Until those questions move beyond district courts, the safest description of the law is not that AI training is fair use, or that it is not. It is that the first three merits-stage rulings have created arguments, not closure.

References

Fair Use and AI Training: Two Recent Decisions Highlight the Complexity of This Issue, Skadden, July 2025.
AI litigation update covering Thomson Reuters v. Ross Intelligence and Third Circuit posture, Norton Rose Fulbright, June 2026.
Copyright and Artificial Intelligence, Part 3: Generative AI Training, U.S. Copyright Office, May 2025.
Coverage of the Bartz v. Anthropic settlement involving pirated books.
Transparency Coalition analysis of voluntary AI content licensing market developments.
The Open Questions in U.S. Generative AI Copyright Litigation, Cleary Gottlieb, January 2026.

← All legal AI tools

Corrections & feedback

Submit corrections to factual information, flag stale data, or share deployment experience. Comments are moderated. Nothing in comments constitutes legal advice.

Comments

Join the discussion with an anonymous comment.

Loading comments...