Fact-check Workflow Extraction for LangGraph

When verification logic grows complex—multiple phases, parallel workers, sophisticated result aggregation—embedding it in a larger editing workflow creates problems. You can’t run verification independently, can’t skip it without code changes, and can’t test it in isolation.

This pattern extracts verification into a standalone workflow with parallel section workers, automatic result aggregation, and early termination for documents without citations.

The problem

Embedded verification creates tight coupling:

editing/
├── nodes/
│   ├── structure.py          (500 lines)
│   ├── enhance.py            (400 lines)
│   ├── verify.py             (800+ lines)  ← Growing complexity
│   │   ├── screen_sections()
│   │   ├── fact_check_worker()
│   │   ├── reference_check_worker()
│   │   └── aggregate_results()
│   └── polish.py             (200 lines)
└── graph/
    └── construction.py        (Complex routing embedded)

Issues:

  • Can’t run verification independently
  • Can’t skip verification without code changes
  • Can’t test verification in isolation
  • Quality settings mixed with editing settings
  • Routing logic for parallel workers clutters the main graph

The solution

Extract verification into a standalone workflow:

enhance/
├── editing/                   # Clean editing workflow
├── fact_check/                # Standalone verification
│   ├── state.py               # Independent state schema
│   ├── quality_presets.py     # Verification-specific presets
│   ├── nodes/                 # Specialized nodes
│   └── graph/                 # Self-contained routing
└── __init__.py                # Three-phase orchestration

The extracted workflow uses three key patterns:

  1. Accumulator reducers for parallel result aggregation
  2. Send pattern for section-level parallel workers
  3. Citation gating for early termination

Accumulator reducers

Parallel workers need to aggregate results. LangGraph’s Annotated[list, add] pattern handles this automatically:

from operator import add
from typing import Annotated, TypedDict
 
class FactCheckState(TypedDict, total=False):
    # Parallel workers accumulate results
    fact_check_results: Annotated[list[dict], add]
    reference_check_results: Annotated[list[dict], add]
    pending_edits: Annotated[list[dict], add]
    errors: Annotated[list[dict], add]

When worker one returns {"fact_check_results": [result1]} and worker two returns {"fact_check_results": [result2]}, LangGraph produces {"fact_check_results": [result1, result2]}.

Critical: workers must return lists. Return {"results": [item]}, not {"results": item}.

Parallel section workers

The Send pattern dispatches parallel workers for each section:

from langgraph.types import Send
 
def route_to_fact_check_sections(state: dict) -> list[Send] | str:
    """Route to parallel workers or skip to next phase."""
    screened_sections = state.get("screened_sections", [])
 
    if not screened_sections:
        return "pre_validate_citations"  # Skip if nothing to check
 
    # Dispatch parallel workers
    sends = []
    for section_id in screened_sections:
        worker_state = {
            "section_id": section_id,
            "section_content": get_section_content(section_id),
            "confidence_threshold": state["quality_settings"].get(
                "verify_confidence_threshold", 0.75
            ),
        }
        sends.append(Send("fact_check_section", worker_state))
 
    return sends

Each Send creates an independent worker with isolated state. Workers run in parallel; their results aggregate via the add reducer.

Citation gating

Skip expensive verification for documents without citations:

def route_citations_or_finalize(state: dict) -> str:
    """Gate: Skip workflow if no citations detected."""
    if state.get("has_citations", False):
        return "screen_sections"
    return "finalize"  # Early exit

Wire the gate in graph construction:

builder.add_conditional_edges(
    "detect_citations",
    route_citations_or_finalize,
    {
        "screen_sections": "screen_sections",
        "finalize": "finalize",
    },
)

Documents without citations skip directly to finalization, saving all verification compute.

Graph construction

The complete graph wires parallel dispatch with convergence:

def build_fact_check_graph() -> StateGraph:
    builder = StateGraph(FactCheckState)
 
    # Nodes
    builder.add_node("detect_citations", detect_citations_node)
    builder.add_node("screen_sections", screen_sections_node)
    builder.add_node("fact_check_section", fact_check_worker)
    builder.add_node("assemble_fact_checks", assemble_node)
    builder.add_node("finalize", finalize_node)
 
    # Sequential start
    builder.add_edge(START, "detect_citations")
 
    # Citation gate
    builder.add_conditional_edges(
        "detect_citations",
        route_citations_or_finalize,
        {"screen_sections": "screen_sections", "finalize": "finalize"},
    )
 
    # Parallel dispatch
    builder.add_conditional_edges(
        "screen_sections",
        route_to_fact_check_sections,
        ["fact_check_section", "finalize"],
    )
 
    # Convergence
    builder.add_edge("fact_check_section", "assemble_fact_checks")
    builder.add_edge("assemble_fact_checks", "finalize")
    builder.add_edge("finalize", END)
 
    return builder.compile()

Three-phase orchestration

Integrate the standalone workflow as a toggleable phase:

async def enhance_report(
    report: str,
    topic: str,
    quality: str = "standard",
    run_editing: bool = True,
    run_fact_check: bool = True,  # Toggle
) -> dict:
    """Three-phase enhancement: supervision -> editing -> fact_check."""
 
    current_report = report
 
    # Phase 1: Supervision
    if quality != "test":
        result = await supervision_enhance(...)
        current_report = result["final_report"]
 
    # Phase 2: Editing
    if run_editing:
        result = await editing(...)
        current_report = result["final_report"]
 
    # Phase 3: Fact-check (standalone, toggleable)
    if run_fact_check:
        result = await fact_check(
            document=current_report,
            topic=topic,
            quality=quality,
        )
        current_report = result["final_report"]
 
    return {"final_report": current_report}

Quality tier configuration

Each quality tier controls verification depth:

QUALITY_PRESETS = {
    "quick": {
        "confidence_threshold": 0.5,
        "max_tool_calls": 5,
        "perplexity_enabled": False,
    },
    "standard": {
        "confidence_threshold": 0.75,
        "max_tool_calls": 15,
        "perplexity_enabled": True,
    },
    "comprehensive": {
        "confidence_threshold": 0.85,
        "max_tool_calls": 25,
        "perplexity_enabled": True,
    },
}

When to use this pattern

Use when:

  • Verification functionality has grown to 500+ lines
  • Multiple parallel workers needed for section-level processing
  • Need to run verification independently of editing
  • Quality tiers should control verification depth separately
  • Phase should be toggleable without modifying core workflow

Don’t use when:

  • Verification is simple (single pass, no parallelism)
  • Tight coupling with editing is intentional
  • Overhead of separate workflow isn’t justified (less than 200 lines)

Trade-offs

Benefits:

  • Independent execution for testing and standalone use
  • Phase toggling via run_fact_check=False
  • Isolated testing without editing dependencies
  • Verification-specific quality presets
  • Parallel efficiency with section-level workers
  • Clean aggregation via add reducers

Costs:

  • Additional coordination between phases
  • Potential re-parsing if editing doesn’t expose document model
  • Workflow overhead for simple use cases
  • Configuration in multiple packages