Skip to main content

Comparing Legal AI Hallucination Databases: Coverage, Gaps, and Best Use Cases

Legal professionals need reliable incident data to assess AI risk, but the three main databases tracking legal AI hallucinations serve different purposes with different inclusion criteria. This comparison explains what each database contains, where it falls short, and which to use for specific professional needs.

  • contract review
  • legal research
  • compliance monitoring
  • document drafting
  • e-discovery
  • litigation support
  • law firm
  • in-house legal
  • enterprise
  • small firm
  • free tier
  • cloud
  • on-premise
  • RAG
  • agentic

Profile summary

Primary use cases
Tracking court-recognized legal AI hallucinations, policy development, training
Pricing tier
Free
Target audience
Law firms, in-house legal, court administrators, compliance teams
Last reviewed
2026-07-04

Full profile

The search for a legal AI incident database for hallucinations usually starts with a practical procurement or policy question: how often does this actually go wrong? The irritating answer is that the public sources do not reconcile because they are not counting the same thing. One tracks court-recognized hallucinations. Another tracks reported AI harms across sectors. A third tries to estimate a public filing-level rate from docket data. Treating those as competing totals creates a cleaner chart and a worse risk assessment.

The first distinction is not technical. It is evidentiary. A hallucinated citation that a lawyer catches before filing is not the same data point as a hallucinated citation that appears in a motion, is challenged by an opponent, is discussed in a written order, and is then entered into a public database. Most public incident data begins late in that chain.

SourceWhat it countsBest useMain caution
Charlotin AI Hallucination Cases DatabaseCourt decisions where a court explicitly finds or implies an AI hallucination; 1,696 decisions globally, including 1,187 U.S. decisions, as of July 3, 2026 [1]Court-recognized legal hallucination patterns by party type, jurisdiction, outcome, practice area, AI tool, and error typeIt is a database of public judicial recognition, not a count of all hallucinations
AI Incident DatabaseReported AI harms across domains, including legal-related incidents such as the Sullivan & Cromwell bankruptcy filing and KPMG hallucinated citations [2]Broader AI-harm context and cross-sector pattern spottingIt is not built around legal workflow fields such as practice area, sanction amount, or party-type breakdown
Clio estimateApproximately 955 U.S. hallucination cases in about 40 million civil filings since January 2023, yielding roughly 0.002% [3]A rare denominator-based public estimateThe denominator is all civil filings, not AI-assisted filings
Three sieve icons showing different inclusion criteria for legal AI hallucination data sources

The Counting Rule Matters More Than the Count

Charlotin’s database is the most useful starting point when the question is specifically legal and specifically about hallucinations that courts have noticed. Its inclusion rule is strict: the court must explicitly find or imply that a hallucination occurred. That gives the database a cleaner evidentiary basis than a news-only tracker, but it also means the database begins after several filters have already operated: the error reached a court filing or legal submission, someone detected it, the issue became visible enough for a court to address, and the result appeared in a public decision.

As of July 3, 2026, the database listed 1,696 court decisions globally, including 1,187 in the United States, and it is updated with daily additions [1]. That snapshot date is not a housekeeping detail. In this topic, a count without a date is already partly misleading.

The reason Charlotin is operationally valuable is not only its scale. It is the field structure. A risk officer can filter by party type, AI tool, jurisdiction, outcome category, and practice area. The party-type split alone changes the conversation: the database identifies 991 pro se matters and 663 lawyer matters, as of the same July 3, 2026 snapshot [1]. That does not prove lawyers are responsible for only that share of hallucinated legal work. It shows what has surfaced in court-recognized decisions under the database’s criteria.

The practice-area tags are also useful because they help move policy discussions away from abstract fear. Charlotin lists, among other categories, 419 contract matters, 229 administrative matters, 179 civil rights matters, and 153 employment matters [1]. Those numbers do not mean contract lawyers are uniquely careless or that other areas are safe. They mean these are the visible legal contexts in which courts have recognized hallucination issues often enough to be captured in the database.

The error-type fields are where the database becomes especially useful for training and review checklists. The database identifies 1,423 fabricated citations, 460 false quotes, and 698 misrepresented holdings [1]. Those categories map onto different review tasks. A citation check confirms whether an authority exists. A quotation check compares language against the source. A holding check asks whether the cited authority actually supports the proposition. Collapsing all three into “fake cases” is too crude for anyone who has to design a verification process.

Three magnifying lenses reveal different annotations on the same legal document

What Charlotin Can Tell You, and What It Cannot

For court-facing legal work, Charlotin can answer questions that a general AI incident database cannot answer well. Which party types appear in public decisions? Which practice areas are showing up? Are courts describing fabricated cases, distorted quotations, or misrepresented holdings? Which outcomes are attached to the court’s recognition of the problem? Those are the questions that become policy language, intake checklists, training examples, and vendor due diligence prompts.

It cannot answer the question people most want answered: what is the true incidence rate of hallucinations in AI-assisted legal work? A court-recognized event is several steps downstream from the underlying error. The database will miss hallucinations that are caught by the lawyer before filing, corrected by a paralegal, resolved after a phone call from opposing counsel, handled through confidential discipline, sealed, or never detected at all. It may also understate lawyer-related events where a court refers conduct elsewhere and the later professional process does not produce a public decision.

This is not a defect in the database. It is the price of a strong inclusion rule. Public judicial recognition is a reliable floor for documented failure modes, not a ceiling on actual failures.

The timing pattern is still worth watching. Bloomberg Law, analyzing the Charlotin data, reported that 90% of U.S. hallucination decisions were written in 2025 [4]. That does not prove a 2025 explosion in underlying hallucinations by itself. It may reflect more AI use, more court awareness, more opposing-party detection, more written orders, or some combination. For legal operations purposes, though, a concentration of public decisions in a recent year matters because courts, clients, and insurers respond to visible cases.

The AI Incident Database serves a different purpose. It collects reported AI harms across domains, so legal hallucination incidents sit beside other AI failures rather than inside a legal-only taxonomy. That broader frame can be useful when the question is, “What kinds of harms are being reported around AI systems?” It is less useful when the question is, “What sanctions have courts imposed on lawyers in employment cases involving fabricated citations?”

The database includes legal-related incidents such as Incident 1558, involving a Sullivan & Cromwell bankruptcy filing, and Incident 1563, involving KPMG hallucinated citations [2]. Those entries are valuable as examples of how hallucination risk appears outside the familiar solo-lawyer fake-case story. Legal risk does not stop at pleadings drafted by counsel of record; it can run through expert work, advisory materials, bankruptcy filings, client-facing work product, and professional services workflows.

But AIID lacks the legal-specific dimensions that make Charlotin so useful for law-firm or court policy. It does not provide the same kind of practice-area breakdown, party-type filter, sanction tracking, or court-outcome structure. That is not a reason to ignore it. It is a reason not to use it as though it were a legal hallucination case database.

The Clio Estimate Is Valuable Because It Has a Denominator

Most hallucination trackers count visible incidents without showing the universe from which those incidents came. Clio’s estimate is notable because it tries to put public hallucination cases against a filing denominator: about 955 U.S. hallucination cases in roughly 40 million civil filings since January 2023, producing an estimate of about 0.002% [3]. In a field full of numerator-only anecdotes, that is a useful move.

It is also easy to misuse. The denominator is all civil filings, not AI-assisted civil filings. If only some filings involved AI assistance, then the rate among AI-assisted filings cannot be inferred from the 0.002% figure. The estimate is better read as a public docket-level lower-bound signal than as a safety rate for legal AI use.

That distinction matters in procurement. A vendor cannot point to an all-filings denominator and claim that its own tool has a hallucination incidence of 0.002%. A law firm cannot use the same number to conclude that AI-assisted drafting is nearly risk-free. The estimate tells us something narrower: publicly visible hallucination cases are rare relative to the entire civil docket universe captured in the method, while the incidence rate inside AI-assisted work remains unknown.

Iceberg showing visible court-recognized hallucinations above a larger hidden mass of undetected or confidential incidents

Public Incidents Are Downstream of Professional Duties

Court-recognized hallucination databases are most useful when paired with verification duties, not when treated as a complete risk map. The National Center for State Courts’ practitioner guide frames the operational rule bluntly: “never trust, always verify,” and the guide has been endorsed by the Conference of Chief Justices [5]. That principle fits the data problem. The absence of a public incident in a database does not mean the underlying workflow is safe; it may mean the error never became public.

The consequences are also becoming more concrete in the public record. DISCO’s 2026 trend analysis describes courts as moving beyond warnings to real consequences, including an Oregon per-infraction formula and rising sanction amounts [6]. The exact sanction environment will vary by court and conduct, but the direction matters for risk controls: hallucinated authorities are no longer just embarrassing examples for CLE slides.

Thomson Reuters makes a related point from the lawyering side, describing generative AI hallucinations as “still pervasive” in legal filings while emphasizing better lawyering as the cure [7]. That is a useful corrective to both panic and complacency. The risk is not solved by refusing all AI use, and it is not solved by buying a tool with confident interface language. It is managed through source verification, role clarity, documented review, and consequences that lawyers and staff actually understand.

Which Source to Use for Which Question

A legal team comparing incident sources should start by naming the decision it is trying to support. The right source changes depending on whether the task is training, procurement, court administration, risk reporting, or broader AI governance.

  • For court-recognized legal hallucination patterns, start with Charlotin because its inclusion rule and filters are closest to legal workflow questions.
  • For cross-sector AI harm context, consult AIID because it can surface legal incidents that do not fit neatly into court-decision tracking.
  • For a rough public denominator estimate, use Clio’s 0.002% figure, but keep the all-civil-filings denominator attached to every mention.
  • For sanctions and court reaction trends, use analyses such as Bloomberg Law and DISCO as context, not as comprehensive incident databases.
  • For policy language, pair incident sources with professional responsibility guidance such as the NCSC practitioner guide.

The most common mistake is to combine these sources into a single incident count. Charlotin’s 1,696 decisions, AIID’s legal-related entries, and Clio’s estimated 955 U.S. cases are not three attempts to measure the same population. They are three windows placed at different points in the failure pipeline.

A Cleaner Way to Read the Pipeline

StageWhat happensWhat public databases usually capture
GenerationAn AI system produces a false citation, quotation, or legal propositionUsually not captured unless the output later becomes public
Internal reviewA lawyer, staff member, or reviewer catches or misses the problemUsually not captured
Filing or useThe material enters a court filing, legal memorandum, expert submission, or client-facing workSometimes captured if later reported or challenged
DetectionA court, opponent, client, or reviewer identifies the hallucinationSometimes captured in reported incidents
Public recognitionA court discusses the hallucination in a written decision or public orderStrongly aligned with Charlotin’s inclusion rule
ConsequenceThe court imposes sanctions, warns counsel, refers the matter, or takes another actionCaptured only when the consequence is public and tied to the incident

This pipeline is why public databases are strong evidence of documented failure modes and weak evidence of total incidence. They tell lawyers what has gone wrong in ways that courts, reporters, or incident submitters could see. They do not tell lawyers how many hallucinations were prevented by competent review or buried by poor detection.

The Practical Bottom Line

No current public source can quantify total hallucination risk from legal AI use. Charlotin is the best legal-specific database for court-recognized hallucination decisions, but its strength is also its boundary. AIID is useful for broader AI-harm awareness, but it lacks the legal fields needed for many professional risk questions. Clio provides a rare denominator-based estimate, but the denominator is all civil filings rather than AI-assisted filings.

For legal risk work, the safest source-selection rule is simple: use Charlotin for documented court patterns, AIID for broader harm context, Clio for a cautious public-rate reference, and professional guidance for verification duties. Do not ask any one database to do all four jobs.

References

  1. Damien Charlotin, AI Hallucination Cases Database
  2. AI Incident Database, AI Incident Database
  3. Clio, AI Hallucinations in Law: How to Spot Them and Stop Them
  4. Bloomberg Law, AI-Faked Cases Become Core Issue Irritating Overworked Judges
  5. NCSC, A Legal Practitioner's Guide to AI & Hallucinations
  6. DISCO, AI Hallucinations and Legal Decisions: Trend Watch
  7. Thomson Reuters Institute, GenAI hallucinations are still pervasive in legal filings but better lawyering is the cure

Corrections & feedback

Submit corrections to factual information, flag stale data, or share deployment experience. Comments are moderated. Nothing in comments constitutes legal advice.

Comments

Join the discussion with an anonymous comment.

Loading comments...
Blogarama - Blog Directory