eDiscovery document review is one of the most resource-intensive tasks in litigation. A mid-size commercial dispute can produce hundreds of thousands of documents; a major antitrust matter can run into the tens of millions. For years, the answer was armies of contract reviewers working in shifts. AI changed the math — but it didn't eliminate the judgment calls.
What follows is a stage-by-stage breakdown of how AI is currently inserted into the eDiscovery review workflow, what it actually does at each stage, where the documented failure modes sit, and what human-in-the-loop verification looks like in practice. The goal is operational clarity, not vendor advocacy.
The Underlying Task
Document review in eDiscovery has three core objectives: identify documents that are responsive to the discovery requests, flag documents that are protected by privilege or work-product doctrine, and produce the responsive non-privileged set to opposing counsel in the required format. Everything else — deduplication, threading, metadata extraction — is infrastructure supporting those three goals.
The legal stakes are asymmetric. Producing a privileged document can waive protection. Missing a responsive document can result in sanctions or adverse inference instructions. Both errors are costly, which is why the workflow has historically been over-engineered toward recall rather than precision.
The Five-Stage AI-Assisted Review Workflow
Current practice — as documented in published court decisions, EDRM guidance, and published methodology from major review vendors — generally follows five stages. AI touches each of them differently.
Stage 1: Collection and Processing
Before any AI model sees a document, data must be collected from custodians, processed into a reviewable format, and deduplicated. AI is present here in limited ways: near-duplicate detection algorithms cluster similar documents to reduce redundant review, and optical character recognition (OCR) pipelines use machine learning to extract text from scanned images with higher accuracy than older rule-based engines.
The failure mode at this stage is upstream: garbage in, garbage out. If a custodian's email archive is incomplete, or if OCR misreads handwritten notes, no downstream AI model can compensate. Practitioners routinely audit processing logs for error rates before allowing review to begin.
Stage 2: Technology-Assisted Review (TAR) and Active Learning
This is where AI does its heaviest lifting. Technology-assisted review — sometimes called predictive coding — uses a trained classification model to score documents for likely responsiveness. The two dominant paradigms are:
- TAR 1.0 (Simple Active Learning): A seed set of documents is coded by senior reviewers, the model trains on that seed set, and a validation protocol determines when the model is sufficiently accurate to extrapolate across the full corpus. The model's output is a relevance score per document; documents below a cutoff threshold are withheld from review.
- TAR 2.0 (Continuous Active Learning / CAL): The model retrains continuously as reviewers code documents, prioritizing the highest-scoring documents for human review. There is no fixed seed set; the model improves throughout the review cycle. CAL has largely displaced TAR 1.0 in large-volume matters because it handles concept drift better and doesn't require a separate validation phase.
Stage 3: Privilege and Confidentiality Screening
Privilege review is where AI assistance gets more complicated. Most platforms offer some combination of attorney-name matching (flagging documents that mention attorneys in the privilege log), communication-pattern analysis (identifying email threads involving in-house counsel), and, increasingly, generative AI summarization to help reviewers triage potentially privileged documents faster.
The attorney-name matching approach is well-established but brittle. It misses privilege claims for communications where the attorney isn't named in the header — common in forwarded chains, embedded attachments, and documents referencing legal advice without naming the attorney. Generative AI summarization can surface the substance of a document faster, but it introduces a different risk: the model may mischaracterize the document's content, leading to an incorrect privilege determination.
Stage 4: Generative AI for Review Acceleration
Since 2023, a new layer has been added to many review workflows: generative AI tools that summarize documents, extract specific facts, and answer natural-language queries against the document corpus. This is distinct from TAR — it's not scoring documents for responsiveness, it's helping reviewers understand documents faster.
Common use cases in this layer include: generating one-paragraph summaries of lengthy contracts or board minutes, extracting dates and named parties from transactional documents, and running issue-specific queries ("which documents discuss the safety testing protocol for Product X?") against a filtered subset.
The hallucination risk here is concrete and specific. A generative model summarizing a document may confidently state a fact that doesn't appear in the source — or omit a fact that does. In a responsiveness determination, that can mean a document gets coded incorrectly based on an inaccurate summary. The reviewer never looks at the full document. This failure mode is distinct from TAR errors, which are statistical and can be measured through validation; generative AI errors are document-specific and may not surface in aggregate quality metrics.
Stage 5: Quality Control and Production
Before production, the reviewed set goes through quality control. In AI-assisted workflows, QC typically includes statistical sampling of the withheld set (documents the model scored as non-responsive), elusion testing to estimate how many responsive documents were incorrectly excluded, and attorney review of documents in borderline score ranges.
The production format — load files, metadata fields, image vs. native format — is governed by the ESI protocol agreed between the parties. AI doesn't typically touch this stage, but errors here (wrong redaction format, missing metadata) can trigger sanctions independent of any AI-related issues.
Where AI Is Currently Inserted: A Stage Map
| Stage | AI Technique | Primary Benefit | Known Failure Mode |
|---|---|---|---|
| Collection & Processing | Near-duplicate clustering, ML-based OCR | Reduces redundant review volume | OCR errors on handwriting; incomplete custodian data goes undetected |
| TAR / Active Learning | Classification models (SVM, neural) | Scales responsiveness scoring across millions of documents | Concept drift; seed set bias; threshold-setting errors |
| Privilege Screening | Attorney-name matching, pattern analysis, gen AI summarization | Speeds triage of potentially privileged documents | Misses unnamed-attorney privilege; gen AI mischaracterizes document substance |
| Gen AI Review Acceleration | LLM summarization, RAG-based querying | Reduces per-document review time for complex documents | Hallucinated facts in summaries; missed responsive content |
| Quality Control | Statistical sampling, elusion testing | Validates TAR accuracy before production | Sampling error if withheld set is very large; doesn't catch gen AI summary errors |
Documented Limitations
The limitations below are drawn from published court decisions, EDRM working group reports, and documented practitioner experience — not vendor marketing materials.
TAR Models Are Only as Good as Their Training Data
A TAR model trained on a seed set coded by one attorney may not generalize well if a second attorney with a different interpretation of "responsiveness" codes the validation set. Inconsistent human coding propagates through the model. In matters where the scope of discovery is disputed, this is a real risk: the model learns from the attorney's interpretation of a contested legal standard.
Generative AI Summaries Are Not Document Substitutes
Several documented cases of eDiscovery workflow failures involve reviewers relying on AI-generated summaries without reading the underlying document. The summary looked complete. The document contained a single sentence that changed the responsiveness analysis. The sentence didn't make it into the summary.
This isn't a hypothetical. It's a predictable consequence of how summarization models work: they compress, and compression involves selection. What gets selected reflects training priorities, not legal judgment.
Multilingual and Non-Standard Document Formats
TAR and generative AI tools trained primarily on English-language business documents perform measurably worse on non-English content, technical documents with dense domain vocabulary, and handwritten or scanned materials. In international matters — common in cross-border M&A disputes, cartel investigations, and FCPA matters — this is a material limitation, not an edge case.
Data Security and Confidentiality Boundaries
Uploading client documents to cloud-based AI review platforms raises confidentiality questions under Model Rule 1.6 and its state equivalents. Most enterprise eDiscovery platforms have addressed this through contractual data processing agreements and zero-data-retention commitments. But the attorney's obligation to verify those commitments — and to understand where documents are processed — sits with the supervising attorney, not the vendor.
Human-in-the-Loop Verification: What Practitioners Currently Apply
The following verification steps reflect documented practice from published TAR protocols, court-approved ESI agreements, and EDRM methodology guidance. They are not a compliance checklist — specific matters will require different approaches depending on volume, scope, and applicable court rules.
- Seed set review by senior attorney: In TAR 1.0 workflows, the seed set must be coded by an attorney with sufficient understanding of the legal issues to make accurate responsiveness determinations. Paralegal or contract reviewer coding of the seed set is a known source of model bias.
- Validation protocol documentation: The validation methodology — sample size, confidence interval, recall and precision targets — should be documented before review begins, not reverse-engineered from results. Courts have rejected TAR protocols where the validation approach was designed after the fact.
- Elusion testing on the withheld set: A statistically valid sample of documents below the responsiveness cutoff should be human-reviewed to estimate the elusion rate — the percentage of responsive documents the model excluded. The acceptable elusion rate depends on the matter; there is no universal standard.
- Attorney review of AI-generated summaries before coding: Where generative AI summaries are used to accelerate review, the workflow should require the reviewer to access the full document for any document where the summary indicates potential responsiveness or privilege. Summary-only coding should be limited to clearly non-responsive documents.
- Privilege log attorney verification: Every document on the privilege log must be reviewed by an attorney before the log is produced. AI-assisted privilege identification is a triage tool, not a final determination. The attorney who signs off on the privilege log is personally responsible for its accuracy.
- Opposing counsel protocol disclosure: In many federal districts and under some state court rules, the use of TAR must be disclosed to opposing counsel. The disclosure should include the model type, validation approach, and recall/precision estimates. Failure to disclose has resulted in sanctions in documented cases.
Where the Workflow Is Evolving in 2026
Two developments are changing how practitioners approach AI-assisted review in the current period.
First, RAG-based (retrieval-augmented generation) review interfaces are becoming standard in enterprise eDiscovery platforms. Instead of reviewing documents one at a time, attorneys can query the document corpus in natural language and receive sourced answers with document citations. This is genuinely faster for issue-spotting. The verification challenge is that the cited documents must still be reviewed — the model's answer is a navigation tool, not a finding.
Second, some platforms are beginning to offer automated privilege log generation — using generative AI to draft the privilege description for each withheld document. This is a significant time-saver on large matters where privilege logs run to thousands of entries. It's also an area where hallucination risk is particularly consequential: an inaccurate privilege description can undermine the privilege claim if the log is challenged.
Practical Takeaways for Supervising Attorneys
- Document the TAR protocol before review begins, including validation methodology and recall targets. Courts have held that retroactive protocol documentation is insufficient.
- Treat generative AI summaries as navigation aids, not review substitutes. Any document that could be responsive requires attorney eyes on the full text.
- Verify your platform's data processing agreement before uploading client documents. Confirm zero-data-retention commitments are contractually binding, not just stated in marketing materials.
- Check local rules and standing orders on TAR disclosure. Several federal districts have standing orders requiring disclosure; others address it through meet-and-confer obligations.
- For multilingual matters, test the AI tool's performance on a sample of the non-English documents before relying on it for the full corpus. Performance gaps are predictable and should be anticipated in the review plan.
- Every privilege log entry must be attorney-verified before production, regardless of whether it was AI-drafted. The supervising attorney's professional responsibility exposure does not diminish because a machine wrote the first draft.
Comments
Join the discussion with an anonymous comment.