Fact-check Workflow Extraction for LangGraph

When verification logic grows complex—multiple phases, parallel workers, sophisticated result aggregation—embedding it in a larger editing workflow creates problems. You can’t run verification independently, can’t skip it without code changes, and can’t test it in isolation.

This pattern extracts verification into a standalone workflow with parallel section workers, automatic result aggregation, and early termination for documents without citations.

The problem

Embedded verification creates tight coupling:

editing/
├── nodes/
│   ├── structure.py          (500 lines)
│   ├── enhance.py            (400 lines)
│   ├── verify.py             (800+ lines)  ← Growing complexity
│   │   ├── screen_sections()
│   │   ├── fact_check_worker()
│   │   ├── reference_check_worker()
│   │   └── aggregate_results()
│   └── polish.py             (200 lines)
└── graph/
    └── construction.py        (Complex routing embedded)

Issues:

Can’t run verification independently
Can’t skip verification without code changes
Can’t test verification in isolation
Quality settings mixed with editing settings
Routing logic for parallel workers clutters the main graph

The solution

Extract verification into a standalone workflow:

enhance/
├── editing/                   # Clean editing workflow
├── fact_check/                # Standalone verification
│   ├── state.py               # Independent state schema
│   ├── quality_presets.py     # Verification-specific presets
│   ├── nodes/                 # Specialized nodes
│   └── graph/                 # Self-contained routing
└── __init__.py                # Three-phase orchestration

The extracted workflow uses three key patterns:

Accumulator reducers for parallel result aggregation
Send pattern for section-level parallel workers
Citation gating for early termination

Accumulator reducers

Parallel workers need to aggregate results. LangGraph’s Annotated[list, add] pattern handles this automatically:

from operator import add
from typing import Annotated, TypedDict
 
class FactCheckState(TypedDict, total=False):
    # Parallel workers accumulate results
    fact_check_results: Annotated[list[dict], add]
    reference_check_results: Annotated[list[dict], add]
    pending_edits: Annotated[list[dict], add]
    errors: Annotated[list[dict], add]

When worker one returns {"fact_check_results": [result1]} and worker two returns {"fact_check_results": [result2]}, LangGraph produces {"fact_check_results": [result1, result2]}.

Critical: workers must return lists. Return {"results": [item]}, not {"results": item}.

Parallel section workers

The Send pattern dispatches parallel workers for each section:

from langgraph.types import Send
 
def route_to_fact_check_sections(state: dict) -> list[Send] | str:
    """Route to parallel workers or skip to next phase."""
    screened_sections = state.get("screened_sections", [])
 
    if not screened_sections:
        return "pre_validate_citations"  # Skip if nothing to check
 
    # Dispatch parallel workers
    sends = []
    for section_id in screened_sections:
        worker_state = {
            "section_id": section_id,
            "section_content": get_section_content(section_id),
            "confidence_threshold": state["quality_settings"].get(
                "verify_confidence_threshold", 0.75
            ),
        }
        sends.append(Send("fact_check_section", worker_state))
 
    return sends

Each Send creates an independent worker with isolated state. Workers run in parallel; their results aggregate via the add reducer.

Citation gating

Skip expensive verification for documents without citations:

def route_citations_or_finalize(state: dict) -> str:
    """Gate: Skip workflow if no citations detected."""
    if state.get("has_citations", False):
        return "screen_sections"
    return "finalize"  # Early exit

Wire the gate in graph construction:

builder.add_conditional_edges(
    "detect_citations",
    route_citations_or_finalize,
    {
        "screen_sections": "screen_sections",
        "finalize": "finalize",
    },
)

Documents without citations skip directly to finalization, saving all verification compute.

Graph construction

The complete graph wires parallel dispatch with convergence:

def build_fact_check_graph() -> StateGraph:
    builder = StateGraph(FactCheckState)
 
    # Nodes
    builder.add_node("detect_citations", detect_citations_node)
    builder.add_node("screen_sections", screen_sections_node)
    builder.add_node("fact_check_section", fact_check_worker)
    builder.add_node("assemble_fact_checks", assemble_node)
    builder.add_node("finalize", finalize_node)
 
    # Sequential start
    builder.add_edge(START, "detect_citations")
 
    # Citation gate
    builder.add_conditional_edges(
        "detect_citations",
        route_citations_or_finalize,
        {"screen_sections": "screen_sections", "finalize": "finalize"},
    )
 
    # Parallel dispatch
    builder.add_conditional_edges(
        "screen_sections",
        route_to_fact_check_sections,
        ["fact_check_section", "finalize"],
    )
 
    # Convergence
    builder.add_edge("fact_check_section", "assemble_fact_checks")
    builder.add_edge("assemble_fact_checks", "finalize")
    builder.add_edge("finalize", END)
 
    return builder.compile()

Three-phase orchestration

Integrate the standalone workflow as a toggleable phase:

async def enhance_report(
    report: str,
    topic: str,
    quality: str = "standard",
    run_editing: bool = True,
    run_fact_check: bool = True,  # Toggle
) -> dict:
    """Three-phase enhancement: supervision -> editing -> fact_check."""
 
    current_report = report
 
    # Phase 1: Supervision
    if quality != "test":
        result = await supervision_enhance(...)
        current_report = result["final_report"]
 
    # Phase 2: Editing
    if run_editing:
        result = await editing(...)
        current_report = result["final_report"]
 
    # Phase 3: Fact-check (standalone, toggleable)
    if run_fact_check:
        result = await fact_check(
            document=current_report,
            topic=topic,
            quality=quality,
        )
        current_report = result["final_report"]
 
    return {"final_report": current_report}

Quality tier configuration

Each quality tier controls verification depth:

QUALITY_PRESETS = {
    "quick": {
        "confidence_threshold": 0.5,
        "max_tool_calls": 5,
        "perplexity_enabled": False,
    },
    "standard": {
        "confidence_threshold": 0.75,
        "max_tool_calls": 15,
        "perplexity_enabled": True,
    },
    "comprehensive": {
        "confidence_threshold": 0.85,
        "max_tool_calls": 25,
        "perplexity_enabled": True,
    },
}

When to use this pattern

Use when:

Verification functionality has grown to 500+ lines
Multiple parallel workers needed for section-level processing
Need to run verification independently of editing
Quality tiers should control verification depth separately
Phase should be toggleable without modifying core workflow

Don’t use when:

Verification is simple (single pass, no parallelism)
Tight coupling with editing is intentional
Overhead of separate workflow isn’t justified (less than 200 lines)

Trade-offs

Benefits:

Independent execution for testing and standalone use
Phase toggling via run_fact_check=False
Isolated testing without editing dependencies
Verification-specific quality presets
Parallel efficiency with section-level workers
Clean aggregation via add reducers

Costs:

Additional coordination between phases
Potential re-parsing if editing doesn’t expose document model
Workflow overhead for simple use cases
Configuration in multiple packages

about thala

Explorer

Fact-check Workflow Extraction for LangGraph

The problem

The solution

Accumulator reducers

Parallel section workers

Citation gating

Graph construction

Three-phase orchestration

Quality tier configuration

When to use this pattern

Trade-offs

Table of Contents

about thala

Explorer

Fact-check Workflow Extraction for LangGraph

The problem

The solution

Accumulator reducers

Parallel section workers

Citation gating

Graph construction

Three-phase orchestration

Quality tier configuration

When to use this pattern

Trade-offs

Related resources

Table of Contents