Multi-Phase Document Editing with Pre-Screening and Caching
Document editing workflows often run expensive verification on every section, regardless of whether the section contains verifiable claims. This pattern combines multi-phase editing with intelligent pre-screening and citation caching to reduce costs by approximately 45 percent.
The problem
A naive document editing pipeline runs all phases on all content:
- Parse the document
- Enhance every section
- Fact-check every section
- Polish every section
This wastes resources. Many sections don’t contain citations. Many sections don’t need fact-checking. Running expensive Opus calls on introductory paragraphs or structural transitions burns budget without improving quality.
The solution
The pattern uses three optimizations:
- Phase-specific routing skips entire phases based on document characteristics
- Pre-screening with Haiku categorizes sections before expensive operations
- Citation batch caching validates all citations once before any section processing
graph TD A[Document Input] --> B[Parse & Analyze] B --> C{has_citations?} C -->|Yes| D[Pre-validate Citations<br/>batch + cache] C -->|No| H[Polish & Assemble] D --> E[Screen Sections<br/>Haiku] E --> F[Fact-Check Sections<br/>only flagged<br/>parallel fan-out] F --> H H --> I[Final Document]
Pre-screening with Haiku
The pre-screening node uses Haiku to categorize sections before expensive fact-checking:
class ScreeningResult(BaseModel):
"""Aggregate screening result for all sections."""
sections_to_check: list[str] = Field(
description="Section IDs that need expensive fact-checking"
)
sections_to_skip: list[str] = Field(
description="Section IDs that can skip fact-checking"
)
async def screen_sections_for_fact_check(state: dict) -> dict[str, Any]:
"""Pre-screen sections to determine which need fact-checking.
Uses lightweight Haiku model to categorize sections, reducing
expensive fact-checking by approximately 50 percent.
"""
sections = state.get("parsed_sections", [])
# Build preview strings (150 chars each) for efficient screening
sections_summary_parts = []
for section in sections:
preview = section.get("content", "")[:150]
section_id = section.get("id", "unknown")
sections_summary_parts.append(f"[{section_id}]: {preview}...")
# Haiku is 60 times cheaper than Opus
result: ScreeningResult = await get_structured_output(
output_schema=ScreeningResult,
user_prompt=SCREENING_USER.format(
sections="\n\n".join(sections_summary_parts)
),
system_prompt=SCREENING_SYSTEM,
tier=ModelTier.HAIKU,
)
return {
"screened_sections": result.sections_to_check,
"screening_skipped": result.sections_to_skip,
}The screening prompt instructs Haiku to flag sections that:
- Contain specific citations or references
- Make quantitative claims
- Reference studies or research findings
- Attribute statements to specific sources
Sections without these characteristics skip fact-checking entirely.
Citation batch caching
Instead of validating citations on demand (which causes redundant API calls when multiple sections reference the same paper), the pattern validates all unique citations once before any section processing:
_citation_validation_cache: dict[str, dict] = {}
async def pre_validate_citations(state: dict) -> dict[str, Any]:
"""Pre-validate ALL unique citations once, cache results."""
all_citations = state.get("all_citations", [])
unique_citations = list(set(all_citations))
validated: dict[str, dict] = {}
citations_to_validate: list[str] = []
# Check cache first
for citation_key in unique_citations:
if citation_key in _citation_validation_cache:
validated[citation_key] = _citation_validation_cache[citation_key]
else:
citations_to_validate.append(citation_key)
# Validate uncached citations in parallel
if citations_to_validate:
validation_tasks = [
validate_single_citation(key) for key in citations_to_validate
]
results = await asyncio.gather(*validation_tasks)
for citation_key, result in zip(citations_to_validate, results):
_citation_validation_cache[citation_key] = result
validated[citation_key] = result
return {"citation_cache": validated}The cache persists within the process, surviving phase transitions. For multi-document processing, call clear_cache() between unrelated documents.
Conditional phase routing
The graph uses conditional edges to skip phases based on document content:
def route_after_structure(state: dict) -> str:
"""Route based on document content after structure phase."""
if state.get("has_citations", False):
return "screen_for_enhancement"
return "screen_for_polish" # Skip enhancement + verification
def route_after_verification(state: dict) -> str:
"""Route based on verification results."""
if state.get("pending_edits", []):
return "apply_verified_edits"
return "screen_for_polish"Documents without citations skip enhancement and verification entirely. This is not just about cost—it also prevents the model from hallucinating citations that were not in the original.
Cost analysis
| Approach | Verification Cost | Screening Cost | Net Savings |
|---|---|---|---|
| No screening | 100 percent | 0 percent | 0 percent |
| Keyword filtering | approximately 70 percent | approximately 0 percent | approximately 30 percent |
| Haiku pre-screening | approximately 50 percent | approximately 5 percent | approximately 45 percent |
The Haiku screening call costs approximately 5 percent of a single Opus verification call but eliminates approximately 50 percent of verification work.
Graph construction
The complete graph wires together all phases with conditional routing:
def create_document_editing_graph() -> StateGraph:
builder = StateGraph(DocumentEditingState)
# Phase 1: Structure
builder.add_node("parse_document", parse_document_node)
builder.add_node("analyze_structure", analyze_structure_node)
# Phase 2: Enhancement (conditional)
builder.add_node("screen_for_enhancement", screen_for_enhancement_node)
builder.add_node("pre_validate_citations", pre_validate_citations)
builder.add_node("enhance_section", enhance_section_node)
# Phase 3: Verification (conditional)
builder.add_node("screen_for_fact_check", screen_sections_for_fact_check)
builder.add_node("fact_check_section", fact_check_section_node)
builder.add_node("apply_verified_edits", apply_verified_edits_node)
# Phase 4: Polish
builder.add_node("polish_section", polish_section_node)
builder.add_node("final_assembly", final_assembly_node)
# Conditional routing
builder.add_conditional_edges(
"analyze_structure",
route_after_structure,
["screen_for_enhancement", "screen_for_polish"],
)
# Parallel fan-out for section processing
builder.add_conditional_edges(
"screen_for_fact_check",
route_to_parallel_sections,
["fact_check_section", "screen_for_polish"],
)
return builder.compile()When to use this pattern
Use when:
- Documents vary in citation density
- Verification costs dominate your LLM budget
- You can tolerate a 5 percent screening overhead for 50 percent verification savings
- Documents are processed in batches (cache amortizes across documents)
Don’t use when:
- All documents require full verification
- Screening accuracy is critical (the pattern trades precision for cost)
- Documents are one-offs (cache provides no benefit)
Trade-offs
Benefits:
- Approximately 45 percent cost reduction on verification-heavy workflows
- No latency penalty from parallel execution
- Cache reduces redundant API calls across documents
- Phase skipping prevents hallucinated citations
Costs:
- Pre-screening adds complexity
- Cache needs lifecycle management for long-running processes
- Screening may occasionally skip sections that needed checking
- Four-phase architecture is harder to debug than linear pipelines