Two-Pass LLM Planning for Consistent Multi-Image Generation

When you ask an LLM to plan illustrations for a long article, a single call must simultaneously analyze the document, decide a visual style, pick locations, and write generation briefs. The result: each brief invents its own style, locations cluster awkwardly, and the briefs themselves are shallow. This post describes a two-pass approach that fixes all three problems.

The Problem With Single-Pass Planning

A monolithic planning call must juggle four concerns at once:

Analyze the document’s tone, themes, and structure.
Decide a visual style that unifies all images.
Identify the best locations for images.
Write detailed generation briefs for each location.

The failure modes are predictable:

Style drift. Each brief independently invents its own style. One image is watercolor, the next is photorealistic, the third is flat vector.
Cognitive overload. Trying to do everything in one call produces shallow results across the board.
No cross-location awareness. Briefs don’t account for variety or pacing.
Wasted context. The model re-derives the same stylistic decisions for every brief instead of building on prior analysis.

The Fix: Separate Strategy From Execution

Split the planning into two sequential LLM calls with distinct roles.

Pass 1: Creative Direction (The Art Director)

The LLM reads the full document and produces three things:

Visual Identity—palette, mood, style, lighting, avoid-list.
Image Opportunity Map—N+2 candidate locations (more than needed, for selection).
Editorial Notes—tone, pacing, variety guidance.

This pass thinks like a magazine art director. It establishes constraints; it does not write briefs.

Deterministic Selection (Code, Not LLM)

Between passes, code selects which opportunities to brief. Strong opportunities take priority over stretch ones. The header slot is always included when configured. This step is fast, cheap, predictable, and easy to test.

Pass 2: Brief Planning (The Staff Writer)

The LLM reads the document again, plus the visual identity and selected opportunities from Pass 1. It produces:

Up to two candidate briefs per location—genuinely different approaches.
Visual identity references in every brief.
Cross-location variety enforcement.

This pass thinks like a staff writer given a creative brief.

graph TD
    A["Document + Config"] --> B["Pass 1: Creative Direction<br/>(art director)"]
    B --> C["VisualIdentity + ImageOpportunity[] + editorial_notes"]
    C --> D["Selection<br/>(code: filter strong > stretch, cap at target)"]
    D --> E["Pass 2: Plan Briefs<br/>(staff writer)"]
    E --> F["CandidateBrief[] per location"]
    F --> G["Fan-out to generation"]

Schemas: Typed Contracts Between Passes

Pydantic models enforce clean boundaries. Pass 1 produces a CreativeDirectionResult; Pass 2 consumes its fields and produces a PlanBriefsResult.

from typing import Literal
from pydantic import BaseModel, Field
 
 
class VisualIdentity(BaseModel):
    """Consistent visual style across all images in one article."""
 
    primary_style: str = Field(
        description="e.g., 'editorial watercolor illustration'"
    )
    color_palette: list[str] = Field(
        description="3-5 descriptive colors: ['warm amber', 'deep teal', 'ivory']"
    )
    mood: str = Field(
        description="e.g., 'contemplative, intellectual, accessible'"
    )
    lighting: str = Field(
        description="e.g., 'soft diffused natural light'"
    )
    avoid: list[str] = Field(
        description="e.g., ['photorealistic faces', 'neon colors']"
    )
 
 
class ImageOpportunity(BaseModel):
    """A candidate location for an image, identified in Pass 1."""
 
    location_id: str
    insertion_after_header: str
    purpose: Literal["header", "illustration", "diagram"]
    suggested_type: Literal["generated", "public_domain", "diagram"]
    strength: Literal["strong", "stretch"]
    rationale: str
 
 
class CreativeDirectionResult(BaseModel):
    """Full output of Pass 1."""
 
    document_title: str
    visual_identity: VisualIdentity
    image_opportunities: list[ImageOpportunity]
    editorial_notes: str
 
 
class CandidateBrief(BaseModel):
    """A single brief for one candidate at a location, from Pass 2."""
 
    location_id: str
    candidate_index: int = Field(ge=1, le=2)
    image_type: Literal["generated", "public_domain", "diagram"]
    brief: str
    relationship_to_text: Literal[
        "literal", "metaphorical", "explanatory", "evocative"
    ]
    visual_identity_references: str = Field(
        description="How this brief uses the palette/mood/style from Pass 1"
    )
 
 
class PlanBriefsResult(BaseModel):
    """Full output of Pass 2."""
 
    candidate_briefs: list[CandidateBrief]
    brief_strategy_notes: str

The visual_identity_references field on CandidateBrief is the key enforcement mechanism. It forces the LLM to articulate how each brief connects to the shared identity, rather than silently ignoring it.

Node Implementations

Pass 1: Creative direction

async def creative_direction_node(state: dict) -> dict:
    """Pass 1: Establish visual identity and identify image opportunities."""
    document = state["input"]["markdown_document"]
    target_count = state["config"]["target_image_count"]
    extra_count = target_count + 2  # Overgenerate for selection
 
    result = await invoke(
        tier=ModelTier.SONNET,
        system=CREATIVE_DIRECTION_SYSTEM,
        user=(
            f"Plan visual identity for this article.\n\n"
            f"<document>\n{document}\n</document>\n\n"
            f"Target: {target_count} images. "
            f"Identify {extra_count} opportunities."
        ),
        schema=CreativeDirectionResult,
    )
 
    return {
        "visual_identity": result.visual_identity,
        "image_opportunities": result.image_opportunities,
        "editorial_notes": result.editorial_notes,
    }

Note that the document is wrapped in <document> tags. This prevents the LLM from treating article content as instructions—an important defense when processing untrusted documents.

Deterministic opportunity selection

Between passes, code picks which opportunities to brief:

def select_opportunities(
    opportunities: list[ImageOpportunity],
    target_count: int,
    *,
    include_header: bool = True,
) -> list[ImageOpportunity]:
    """Prefer 'strong' over 'stretch'. Always include header if configured."""
    selected: list[ImageOpportunity] = []
 
    header_opps = [o for o in opportunities if o.purpose == "header"]
    non_header = [o for o in opportunities if o.purpose != "header"]
 
    if include_header and header_opps:
        selected.append(header_opps[0])
 
    remaining = target_count - len(selected)
    strong = [o for o in non_header if o.strength == "strong"]
    stretch = [o for o in non_header if o.strength == "stretch"]
 
    selected.extend(strong[:remaining])
    remaining = target_count - len(selected)
    if remaining > 0:
        selected.extend(stretch[:remaining])
 
    return selected

This step is code, not an LLM call, because it is faster, cheaper, deterministic, config-driven, and trivially testable. The +2 overgeneration in Pass 1 gives this step room to drop weak locations.

Pass 2: Brief planning

async def plan_briefs_node(state: dict) -> dict:
    """Pass 2: Generate candidate briefs for each selected opportunity."""
    document = state["input"]["markdown_document"]
    visual_identity = state["visual_identity"]
    opportunities = state["image_opportunities"]
    editorial_notes = state.get("editorial_notes", "")
    target_count = state["config"]["target_image_count"]
 
    selected = select_opportunities(opportunities, target_count)
 
    result = await invoke(
        tier=ModelTier.SONNET,
        system=PLAN_BRIEFS_SYSTEM,
        user=(
            f"Write candidate briefs for these opportunities.\n\n"
            f"<document>\n{document}\n</document>\n\n"
            f"## Visual Identity\n"
            f"{build_visual_identity_context(visual_identity)}\n\n"
            f"## Selected Opportunities\n"
            f"{json.dumps([o.model_dump() for o in selected], indent=2)}\n\n"
            f"## Editorial Notes\n{editorial_notes}"
        ),
        schema=PlanBriefsResult,
    )
 
    return {"candidate_briefs": result.candidate_briefs}

Visual Identity Propagation

The identity flows downstream into every generation prompt:

def build_visual_identity_context(
    vi: VisualIdentity | None,
    *,
    for_imagen: bool = False,
) -> str:
    """Inject visual identity into generation prompts.
 
    for_imagen=True omits the 'avoid' list because Imagen has no
    negative_prompt parameter and embedding 'avoid X' in positive
    prompts paradoxically causes generation of X.
    """
    if not vi:
        return ""
    parts = [
        "\n## Visual Identity (apply to this image)",
        f"- Style: {vi.primary_style}",
        f"- Color palette: {', '.join(vi.color_palette)}",
        f"- Mood: {vi.mood}",
        f"- Lighting: {vi.lighting}",
    ]
    if not for_imagen:
        parts.append(f"- AVOID: {', '.join(vi.avoid)}")
    return "\n".join(parts) + "\n"

The for_imagen flag deserves explanation. Imagen 4 has no negative_prompt parameter. Including “avoid photorealistic faces” in a positive prompt causes the model to attend to “photorealistic faces” and then generate exactly that. The flag strips the avoid-list for Imagen while preserving it for LLM-consumed prompts (diagram briefs, review context).

LangGraph Wiring

In a LangGraph workflow, the two passes wire up sequentially, then fan out to parallel generation:

from langgraph.graph import StateGraph, START, END
from langgraph.types import Send
 
 
def route_to_generation(state: dict) -> list[Send] | str:
    """Fan out to parallel generation nodes with visual identity."""
    briefs = state.get("candidate_briefs", [])
    if not briefs:
        return "finalize"
 
    visual_identity = state.get("visual_identity")
    sends = []
    for brief in briefs:
        if brief.candidate_index != 1:
            continue  # Use primary candidate
        sends.append(Send("generate_image", {
            "brief": brief,
            "visual_identity": visual_identity,
        }))
    return sends
 
 
builder = StateGraph(IllustrateState)
builder.add_edge(START, "creative_direction")
builder.add_edge("creative_direction", "plan_briefs")
builder.add_conditional_edges(
    "plan_briefs", route_to_generation,
    ["generate_image", "finalize"],
)
# ... generation, review, retry, finalize nodes
graph = builder.compile()

The visual identity is passed through Send() data to every parallel generation node. Each branch receives the same identity, producing images that feel like they come from the same source.

Why This Works

Aspect	Single pass	Two passes
Style consistency	Each brief invents own style	All briefs reference shared identity
Location selection	Fixed, no filtering	Overgenerate + filter by strength
Cross-location variety	Hope the LLM remembers	Explicit editorial notes
Brief quality	Shallow (cognitive overload)	Deep (focused task)
Cost	One Sonnet call	Two Sonnet calls (~2x planning cost)
Latency	Lower	Higher (sequential, ~3-5 seconds added)

The cost tradeoff is favorable. Planning is a small fraction of total workflow cost (image generation, vision review, and retries dominate). Two Sonnet calls for planning is a rounding error compared to the quality improvement.

The General Principle

This pattern applies whenever you need N outputs that should feel like they come from the same source. Establish the identity first, then execute within it.

Multi-chapter writing. Pass 1 establishes voice, themes, and narrative arc. Pass 2 writes individual chapters.

Multi-slide presentations. Pass 1 defines visual theme and storyline. Pass 2 designs individual slides.

Multi-email campaigns. Pass 1 defines brand voice and campaign strategy. Pass 2 writes individual emails.

Course curriculum. Pass 1 defines learning outcomes and pedagogical approach. Pass 2 designs individual lessons.

The key structural elements that transfer:

Pass 1 produces a shared identity artifact (visual identity, narrative voice, brand guide).
Pass 1 overproduces opportunities (N+2 candidates with strength ratings).
Deterministic code filters between passes (not another LLM call).
Pass 2 receives the identity as context and must explicitly reference it.
The identity propagates downstream into every parallel execution branch.

GitHub Gist—Simplified schemas and node implementations
Pattern documentation—Full production pattern with tradeoff analysis

about thala

Explorer

The Problem With Single-Pass Planning

The Fix: Separate Strategy From Execution

Pass 1: Creative Direction (The Art Director)

Deterministic Selection (Code, Not LLM)

Pass 2: Brief Planning (The Staff Writer)

Schemas: Typed Contracts Between Passes

Node Implementations

Pass 1: Creative direction

Deterministic opportunity selection

Pass 2: Brief planning

Visual Identity Propagation

LangGraph Wiring

Why This Works

The General Principle

Table of Contents

about thala

Explorer

The Problem With Single-Pass Planning

The Fix: Separate Strategy From Execution

Pass 1: Creative Direction (The Art Director)

Deterministic Selection (Code, Not LLM)

Pass 2: Brief Planning (The Staff Writer)

Schemas: Typed Contracts Between Passes

Node Implementations

Pass 1: Creative direction

Deterministic opportunity selection

Pass 2: Brief planning

Visual Identity Propagation

LangGraph Wiring

Why This Works

The General Principle

Related Resources

Table of Contents