Multi-Phase Document Editing with Pre-Screening and Caching

Document editing workflows often run expensive verification on every section, regardless of whether the section contains verifiable claims. This pattern combines multi-phase editing with intelligent pre-screening and citation caching to reduce costs by approximately 45 percent.

The problem

A naive document editing pipeline runs all phases on all content:

  1. Parse the document
  2. Enhance every section
  3. Fact-check every section
  4. Polish every section

This wastes resources. Many sections don’t contain citations. Many sections don’t need fact-checking. Running expensive Opus calls on introductory paragraphs or structural transitions burns budget without improving quality.

The solution

The pattern uses three optimizations:

  1. Phase-specific routing skips entire phases based on document characteristics
  2. Pre-screening with Haiku categorizes sections before expensive operations
  3. Citation batch caching validates all citations once before any section processing
graph TD
    A[Document Input] --> B[Parse & Analyze]
    B --> C{has_citations?}
    C -->|Yes| D[Pre-validate Citations<br/>batch + cache]
    C -->|No| H[Polish & Assemble]
    D --> E[Screen Sections<br/>Haiku]
    E --> F[Fact-Check Sections<br/>only flagged<br/>parallel fan-out]
    F --> H
    H --> I[Final Document]

Pre-screening with Haiku

The pre-screening node uses Haiku to categorize sections before expensive fact-checking:

class ScreeningResult(BaseModel):
    """Aggregate screening result for all sections."""
    sections_to_check: list[str] = Field(
        description="Section IDs that need expensive fact-checking"
    )
    sections_to_skip: list[str] = Field(
        description="Section IDs that can skip fact-checking"
    )
 
 
async def screen_sections_for_fact_check(state: dict) -> dict[str, Any]:
    """Pre-screen sections to determine which need fact-checking.
 
    Uses lightweight Haiku model to categorize sections, reducing
    expensive fact-checking by approximately 50 percent.
    """
    sections = state.get("parsed_sections", [])
 
    # Build preview strings (150 chars each) for efficient screening
    sections_summary_parts = []
    for section in sections:
        preview = section.get("content", "")[:150]
        section_id = section.get("id", "unknown")
        sections_summary_parts.append(f"[{section_id}]: {preview}...")
 
    # Haiku is 60 times cheaper than Opus
    result: ScreeningResult = await get_structured_output(
        output_schema=ScreeningResult,
        user_prompt=SCREENING_USER.format(
            sections="\n\n".join(sections_summary_parts)
        ),
        system_prompt=SCREENING_SYSTEM,
        tier=ModelTier.HAIKU,
    )
 
    return {
        "screened_sections": result.sections_to_check,
        "screening_skipped": result.sections_to_skip,
    }

The screening prompt instructs Haiku to flag sections that:

  • Contain specific citations or references
  • Make quantitative claims
  • Reference studies or research findings
  • Attribute statements to specific sources

Sections without these characteristics skip fact-checking entirely.

Citation batch caching

Instead of validating citations on demand (which causes redundant API calls when multiple sections reference the same paper), the pattern validates all unique citations once before any section processing:

_citation_validation_cache: dict[str, dict] = {}
 
 
async def pre_validate_citations(state: dict) -> dict[str, Any]:
    """Pre-validate ALL unique citations once, cache results."""
    all_citations = state.get("all_citations", [])
    unique_citations = list(set(all_citations))
 
    validated: dict[str, dict] = {}
    citations_to_validate: list[str] = []
 
    # Check cache first
    for citation_key in unique_citations:
        if citation_key in _citation_validation_cache:
            validated[citation_key] = _citation_validation_cache[citation_key]
        else:
            citations_to_validate.append(citation_key)
 
    # Validate uncached citations in parallel
    if citations_to_validate:
        validation_tasks = [
            validate_single_citation(key) for key in citations_to_validate
        ]
        results = await asyncio.gather(*validation_tasks)
 
        for citation_key, result in zip(citations_to_validate, results):
            _citation_validation_cache[citation_key] = result
            validated[citation_key] = result
 
    return {"citation_cache": validated}

The cache persists within the process, surviving phase transitions. For multi-document processing, call clear_cache() between unrelated documents.

Conditional phase routing

The graph uses conditional edges to skip phases based on document content:

def route_after_structure(state: dict) -> str:
    """Route based on document content after structure phase."""
    if state.get("has_citations", False):
        return "screen_for_enhancement"
    return "screen_for_polish"  # Skip enhancement + verification
 
 
def route_after_verification(state: dict) -> str:
    """Route based on verification results."""
    if state.get("pending_edits", []):
        return "apply_verified_edits"
    return "screen_for_polish"

Documents without citations skip enhancement and verification entirely. This is not just about cost—it also prevents the model from hallucinating citations that were not in the original.

Cost analysis

ApproachVerification CostScreening CostNet Savings
No screening100 percent0 percent0 percent
Keyword filteringapproximately 70 percentapproximately 0 percentapproximately 30 percent
Haiku pre-screeningapproximately 50 percentapproximately 5 percentapproximately 45 percent

The Haiku screening call costs approximately 5 percent of a single Opus verification call but eliminates approximately 50 percent of verification work.

Graph construction

The complete graph wires together all phases with conditional routing:

def create_document_editing_graph() -> StateGraph:
    builder = StateGraph(DocumentEditingState)
 
    # Phase 1: Structure
    builder.add_node("parse_document", parse_document_node)
    builder.add_node("analyze_structure", analyze_structure_node)
 
    # Phase 2: Enhancement (conditional)
    builder.add_node("screen_for_enhancement", screen_for_enhancement_node)
    builder.add_node("pre_validate_citations", pre_validate_citations)
    builder.add_node("enhance_section", enhance_section_node)
 
    # Phase 3: Verification (conditional)
    builder.add_node("screen_for_fact_check", screen_sections_for_fact_check)
    builder.add_node("fact_check_section", fact_check_section_node)
    builder.add_node("apply_verified_edits", apply_verified_edits_node)
 
    # Phase 4: Polish
    builder.add_node("polish_section", polish_section_node)
    builder.add_node("final_assembly", final_assembly_node)
 
    # Conditional routing
    builder.add_conditional_edges(
        "analyze_structure",
        route_after_structure,
        ["screen_for_enhancement", "screen_for_polish"],
    )
 
    # Parallel fan-out for section processing
    builder.add_conditional_edges(
        "screen_for_fact_check",
        route_to_parallel_sections,
        ["fact_check_section", "screen_for_polish"],
    )
 
    return builder.compile()

When to use this pattern

Use when:

  • Documents vary in citation density
  • Verification costs dominate your LLM budget
  • You can tolerate a 5 percent screening overhead for 50 percent verification savings
  • Documents are processed in batches (cache amortizes across documents)

Don’t use when:

  • All documents require full verification
  • Screening accuracy is critical (the pattern trades precision for cost)
  • Documents are one-offs (cache provides no benefit)

Trade-offs

Benefits:

  • Approximately 45 percent cost reduction on verification-heavy workflows
  • No latency penalty from parallel execution
  • Cache reduces redundant API calls across documents
  • Phase skipping prevents hallucinated citations

Costs:

  • Pre-screening adds complexity
  • Cache needs lifecycle management for long-running processes
  • Screening may occasionally skip sections that needed checking
  • Four-phase architecture is harder to debug than linear pipelines