Multi-Phase Document Editing with Pre-Screening and Caching

Document editing workflows often run expensive verification on every section, regardless of whether the section contains verifiable claims. This pattern combines multi-phase editing with intelligent pre-screening and citation caching to reduce costs by approximately 45 percent.

The problem

A naive document editing pipeline runs all phases on all content:

Parse the document
Enhance every section
Fact-check every section
Polish every section

This wastes resources. Many sections don’t contain citations. Many sections don’t need fact-checking. Running expensive Opus calls on introductory paragraphs or structural transitions burns budget without improving quality.

The solution

The pattern uses three optimizations:

Phase-specific routing skips entire phases based on document characteristics
Pre-screening with Haiku categorizes sections before expensive operations
Citation batch caching validates all citations once before any section processing

graph TD
    A[Document Input] --> B[Parse & Analyze]
    B --> C{has_citations?}
    C -->|Yes| D[Pre-validate Citations<br/>batch + cache]
    C -->|No| H[Polish & Assemble]
    D --> E[Screen Sections<br/>Haiku]
    E --> F[Fact-Check Sections<br/>only flagged<br/>parallel fan-out]
    F --> H
    H --> I[Final Document]

Pre-screening with Haiku

The pre-screening node uses Haiku to categorize sections before expensive fact-checking:

class ScreeningResult(BaseModel):
    """Aggregate screening result for all sections."""
    sections_to_check: list[str] = Field(
        description="Section IDs that need expensive fact-checking"
    )
    sections_to_skip: list[str] = Field(
        description="Section IDs that can skip fact-checking"
    )
 
 
async def screen_sections_for_fact_check(state: dict) -> dict[str, Any]:
    """Pre-screen sections to determine which need fact-checking.
 
    Uses lightweight Haiku model to categorize sections, reducing
    expensive fact-checking by approximately 50 percent.
    """
    sections = state.get("parsed_sections", [])
 
    # Build preview strings (150 chars each) for efficient screening
    sections_summary_parts = []
    for section in sections:
        preview = section.get("content", "")[:150]
        section_id = section.get("id", "unknown")
        sections_summary_parts.append(f"[{section_id}]: {preview}...")
 
    # Haiku is 60 times cheaper than Opus
    result: ScreeningResult = await get_structured_output(
        output_schema=ScreeningResult,
        user_prompt=SCREENING_USER.format(
            sections="\n\n".join(sections_summary_parts)
        ),
        system_prompt=SCREENING_SYSTEM,
        tier=ModelTier.HAIKU,
    )
 
    return {
        "screened_sections": result.sections_to_check,
        "screening_skipped": result.sections_to_skip,
    }

The screening prompt instructs Haiku to flag sections that:

Contain specific citations or references
Make quantitative claims
Reference studies or research findings
Attribute statements to specific sources

Sections without these characteristics skip fact-checking entirely.

Citation batch caching

Instead of validating citations on demand (which causes redundant API calls when multiple sections reference the same paper), the pattern validates all unique citations once before any section processing:

_citation_validation_cache: dict[str, dict] = {}
 
 
async def pre_validate_citations(state: dict) -> dict[str, Any]:
    """Pre-validate ALL unique citations once, cache results."""
    all_citations = state.get("all_citations", [])
    unique_citations = list(set(all_citations))
 
    validated: dict[str, dict] = {}
    citations_to_validate: list[str] = []
 
    # Check cache first
    for citation_key in unique_citations:
        if citation_key in _citation_validation_cache:
            validated[citation_key] = _citation_validation_cache[citation_key]
        else:
            citations_to_validate.append(citation_key)
 
    # Validate uncached citations in parallel
    if citations_to_validate:
        validation_tasks = [
            validate_single_citation(key) for key in citations_to_validate
        ]
        results = await asyncio.gather(*validation_tasks)
 
        for citation_key, result in zip(citations_to_validate, results):
            _citation_validation_cache[citation_key] = result
            validated[citation_key] = result
 
    return {"citation_cache": validated}

The cache persists within the process, surviving phase transitions. For multi-document processing, call clear_cache() between unrelated documents.

Conditional phase routing

The graph uses conditional edges to skip phases based on document content:

def route_after_structure(state: dict) -> str:
    """Route based on document content after structure phase."""
    if state.get("has_citations", False):
        return "screen_for_enhancement"
    return "screen_for_polish"  # Skip enhancement + verification
 
 
def route_after_verification(state: dict) -> str:
    """Route based on verification results."""
    if state.get("pending_edits", []):
        return "apply_verified_edits"
    return "screen_for_polish"

Documents without citations skip enhancement and verification entirely. This is not just about cost—it also prevents the model from hallucinating citations that were not in the original.

Cost analysis

Approach	Verification Cost	Screening Cost	Net Savings
No screening	100 percent	0 percent	0 percent
Keyword filtering	approximately 70 percent	approximately 0 percent	approximately 30 percent
Haiku pre-screening	approximately 50 percent	approximately 5 percent	approximately 45 percent

The Haiku screening call costs approximately 5 percent of a single Opus verification call but eliminates approximately 50 percent of verification work.

Graph construction

The complete graph wires together all phases with conditional routing:

def create_document_editing_graph() -> StateGraph:
    builder = StateGraph(DocumentEditingState)
 
    # Phase 1: Structure
    builder.add_node("parse_document", parse_document_node)
    builder.add_node("analyze_structure", analyze_structure_node)
 
    # Phase 2: Enhancement (conditional)
    builder.add_node("screen_for_enhancement", screen_for_enhancement_node)
    builder.add_node("pre_validate_citations", pre_validate_citations)
    builder.add_node("enhance_section", enhance_section_node)
 
    # Phase 3: Verification (conditional)
    builder.add_node("screen_for_fact_check", screen_sections_for_fact_check)
    builder.add_node("fact_check_section", fact_check_section_node)
    builder.add_node("apply_verified_edits", apply_verified_edits_node)
 
    # Phase 4: Polish
    builder.add_node("polish_section", polish_section_node)
    builder.add_node("final_assembly", final_assembly_node)
 
    # Conditional routing
    builder.add_conditional_edges(
        "analyze_structure",
        route_after_structure,
        ["screen_for_enhancement", "screen_for_polish"],
    )
 
    # Parallel fan-out for section processing
    builder.add_conditional_edges(
        "screen_for_fact_check",
        route_to_parallel_sections,
        ["fact_check_section", "screen_for_polish"],
    )
 
    return builder.compile()

When to use this pattern

Use when:

Documents vary in citation density
Verification costs dominate your LLM budget
You can tolerate a 5 percent screening overhead for 50 percent verification savings
Documents are processed in batches (cache amortizes across documents)

Don’t use when:

All documents require full verification
Screening accuracy is critical (the pattern trades precision for cost)
Documents are one-offs (cache provides no benefit)

Trade-offs

Benefits:

Approximately 45 percent cost reduction on verification-heavy workflows
No latency penalty from parallel execution
Cache reduces redundant API calls across documents
Phase skipping prevents hallucinated citations

Costs:

Pre-screening adds complexity
Cache needs lifecycle management for long-running processes
Screening may occasionally skip sections that needed checking
Four-phase architecture is harder to debug than linear pipelines

about thala

Explorer

Multi-Phase Document Editing with Pre-Screening and Caching

The problem

The solution

Pre-screening with Haiku

Citation batch caching

Conditional phase routing

Cost analysis

Graph construction

When to use this pattern

Trade-offs

Table of Contents

about thala

Explorer

Multi-Phase Document Editing with Pre-Screening and Caching

The problem

The solution

Pre-screening with Haiku

Citation batch caching

Conditional phase routing

Cost analysis

Graph construction

When to use this pattern

Trade-offs

Related resources

Table of Contents