Iterative Document Supervision with LangGraph and Extended Thinking

Long-form academic documents have uneven theoretical depth. Some sections are well-grounded with citations to foundational work, while others remain superficial. Single-pass LLM generation cannot catch these gaps, and manual review is time-consuming and inconsistent.

This pattern implements an iterative supervision loop that uses Opus with extended thinking to analyze documents for theoretical gaps, triggers targeted research expansion on identified issues, and integrates findings until quality thresholds are met.

The Core Insight

The key innovation is using extended thinking (an 8000-token reasoning budget) for gap analysis. This allows the supervisor to deeply reason about theoretical grounding before making a decision. Combined with tracking explored issues to prevent re-exploration, the pattern achieves focused, efficient document improvement.

How It Works

graph TD
    A[START] --> B[analyze_review]
    B --> C{Decision?}
    C -->|research_needed| D[expand_topic]
    C -->|pass_through| G[finalize]
    D --> E[integrate_content]
    E --> F{Continue?}
    F -->|continue| B
    F -->|complete| G
    G --> H[END]

The supervision loop:

  1. Analyze: Opus examines the document for theoretical gaps, identifying one issue per iteration.
  2. Decide: Either flag a gap to address, or pass through (approve the document).
  3. Expand: If a gap is found, run targeted research to find relevant sources.
  4. Integrate: Merge new findings into the document with full restructuring allowed.
  5. Loop: Continue until pass-through or max iterations reached.

Implementation

Structured Decision Output

The supervisor returns a structured decision that either approves the document or identifies a specific gap:

from pydantic import BaseModel, Field, ConfigDict
from typing import Literal
 
 
class IdentifiedIssue(BaseModel):
    """A theoretical gap identified by the supervisor."""
 
    model_config = ConfigDict(extra="forbid")
 
    topic: str = Field(description="The specific topic lacking depth")
    issue_type: Literal[
        "underlying_theory",
        "methodological_foundation",
        "unifying_threads",
        "foundational_concepts",
    ] = Field(description="Category of theoretical gap")
    rationale: str = Field(description="Why this gap matters")
    research_query: str = Field(description="Query to find relevant papers")
    integration_guidance: str = Field(
        description="How to integrate findings into the document"
    )
 
 
class SupervisorDecision(BaseModel):
    """Supervisor's decision after analysis."""
 
    model_config = ConfigDict(extra="forbid")
 
    action: Literal["research_needed", "pass_through"] = Field(
        description="Whether more research is needed"
    )
    reasoning: str = Field(description="Explanation for the decision")
    issue: IdentifiedIssue | None = Field(
        default=None,
        description="Identified issue if action is research_needed",
    )

Analysis Node with Extended Thinking

The analysis node uses Opus with extended thinking for deep gap detection:

async def analyze_review_node(state: dict[str, Any]) -> dict[str, Any]:
    """Analyze the document for theoretical gaps."""
    current_review = state.get("current_review", "")
    issues_explored = state.get("issues_explored", [])
    iteration = state.get("iteration", 0)
 
    # Use Opus with extended thinking for deep analysis
    llm = get_llm(
        tier=ModelTier.OPUS,
        thinking_budget=8000,  # 8K tokens for reasoning
        max_tokens=4096,
    )
 
    structured_llm = llm.with_structured_output(SupervisorDecision)
 
    messages = [
        {"role": "system", "content": SUPERVISOR_SYSTEM},
        {"role": "user", "content": SUPERVISOR_USER.format(
            final_review=current_review,
            issues_explored=format_explored(issues_explored),
            iteration=iteration + 1,
        )},
    ]
 
    decision = await structured_llm.ainvoke(messages)
 
    updates = {"decision": decision.model_dump(), "iteration": iteration + 1}
 
    if decision.action == "pass_through":
        updates["is_complete"] = True
    elif decision.issue:
        # Track to prevent re-exploration
        updates["issues_explored"] = [decision.issue.topic]
 
    return updates

State with Proper Reducers

For LangGraph workflows with accumulating state, use reducers to ensure correct list/dict merging:

from typing import Annotated
from operator import add
 
 
def merge_dicts(a: dict, b: dict) -> dict:
    return {**a, **b}
 
 
class SupervisionState(TypedDict, total=False):
    current_review: str
    iteration: int
    max_iterations: int
    is_complete: bool
 
    # Accumulating fields need reducers
    issues_explored: Annotated[list[str], add]
    supervision_expansions: Annotated[list[dict], add]
 
    # Dict fields that merge
    paper_corpus: Annotated[dict[str, Any], merge_dicts]

Graph Construction

def create_supervision_graph(state_class) -> StateGraph:
    builder = StateGraph(state_class)
 
    builder.add_node("analyze_review", analyze_review_node)
    builder.add_node("expand_topic", expand_topic_node)
    builder.add_node("integrate_content", integrate_content_node)
    builder.add_node("finalize", finalize_node)
 
    builder.add_edge(START, "analyze_review")
 
    builder.add_conditional_edges(
        "analyze_review",
        route_after_analysis,
        {"expand": "expand_topic", "finalize": "finalize"},
    )
 
    builder.add_edge("expand_topic", "integrate_content")
 
    builder.add_conditional_edges(
        "integrate_content",
        should_continue_supervision,
        {"continue": "analyze_review", "complete": "finalize"},
    )
 
    builder.add_edge("finalize", END)
 
    return builder.compile()

Quality Tier Integration

The pattern supports quality tiers that control iteration bounds:

Quality TierMax IterationsUse Case
quick1Fast feedback, minor improvements
standard2Balanced quality and speed
comprehensive3Thorough review
high_quality5Maximum depth

Key Design Decisions

One issue per iteration: Rather than identifying all gaps at once, the supervisor identifies one issue per iteration. This prevents resource waste on issues that might become irrelevant after earlier expansions address them.

Issue tracking: Previously explored topics are passed to the supervisor prompt to prevent re-exploration of the same gaps.

Full restructuring: The integration node is allowed to restructure the document, not just append. This produces more coherent results when new content changes the document’s narrative flow.

Conservative supervision: The prompt emphasizes being conservative. The goal is quality assurance, not endless expansion.

Termination Conditions

The loop terminates when:

  1. Pass-through: Supervisor approves current quality.
  2. Max iterations: Iteration limit reached (configurable by quality tier).
  3. Circuit breaker: Two or more consecutive failures (graceful degradation).

Trade-offs

Benefits:

  • Targeted improvement (only researches specific gaps).
  • Quality assurance (Opus-level analysis catches subtle issues).
  • Bounded iteration (quality settings control effort).

Costs:

  • Multiple Opus calls (analysis and integration per iteration).
  • Latency (each iteration adds research and integration time).