Unified Quality Tier System for LLM Workflows

Multi-workflow systems need consistent quality configuration. Without standardization, different workflows use different tier names, users don’t know what max_iterations=4 means, and wrapper workflows can’t easily pass quality settings through.

This pattern provides a 5-tier named quality system with workflow-specific preset mappings and standardized API signatures.

The Problem

Quality configuration challenges in multi-workflow systems:

  1. Inconsistent naming: Different workflows use “fast” vs “quick” vs “rapid.”
  2. Parameter confusion: Users don’t know what max_papers=150 means in terms of results.
  3. No semantic meaning: Numeric parameters don’t convey quality expectations.
  4. Cross-workflow orchestration: Wrapper workflows can’t easily pass quality settings.
  5. Scattered validation: Each workflow validates quality independently.

The Solution

Implement a unified quality system with:

  • Single QualityTier type shared across all workflows
  • Descriptive tier names with clear time/scope expectations
  • Workflow-specific preset mappings
  • Standardized API signatures and return structures

Quality Tiers

from typing import Literal
 
QualityTier = Literal["test", "quick", "standard", "comprehensive", "high_quality"]
 
QUALITY_TIER_DESCRIPTIONS = {
    "test": "Minimal processing for testing (~one min)",
    "quick": "Fast results with limited depth (~five min)",
    "standard": "Balanced quality and speed (~15 min)",
    "comprehensive": "Thorough processing (~30 min)",
    "high_quality": "Maximum depth and quality (45+ min)",
}
TierUse CaseTypical Duration
testCI/CD, quick validation~one min
quickInteractive exploration, demos~five min
standardProduction use (default)~15 min
comprehensiveImportant research tasks~30 min
high_qualityPublication-quality output45+ min

Workflow-Specific Presets

Each workflow defines what quality means for its domain:

from typing import TypedDict
 
class AcademicQualitySettings(TypedDict):
    max_stages: int              # diffusion engine stages
    max_papers: int              # papers to process
    target_word_count: int       # final review length
    min_citations_filter: int    # citation threshold
    saturation_threshold: float  # coverage delta for termination
 
ACADEMIC_PRESETS: dict[QualityTier, AcademicQualitySettings] = {
    "test": {
        "max_stages": 1,
        "max_papers": 5,
        "target_word_count": 500,
        "min_citations_filter": 0,
        "saturation_threshold": 0.5,
    },
    "standard": {
        "max_stages": 4,
        "max_papers": 150,
        "target_word_count": 6000,
        "min_citations_filter": 5,
        "saturation_threshold": 0.15,
    },
    "high_quality": {
        "max_stages": 8,
        "max_papers": 400,
        "target_word_count": 15000,
        "min_citations_filter": 10,
        "saturation_threshold": 0.05,
    },
}

Different workflows have different quality dimensions:

  • Academic: Papers count, citation threshold, word count
  • Web research: Iterations, recursion depth
  • Books: Recommendations per category, model choice

A unified mapping would be either too generic or too complex.

Standardized API Signatures

All workflow entry points use consistent parameter order:

async def academic_lit_review(
    topic: str,                                # primary input
    research_questions: list[str],             # secondary input
    quality: QualityTier = "standard",         # quality tier
    language: str = "en",                      # language
) -> dict[str, Any]:
    # 1. validate quality tier
    quality = validate_quality_tier(quality)
 
    # 2. load preset settings
    quality_settings = ACADEMIC_PRESETS[quality]
 
    # 3. run workflow with quality observability
    result = await graph.ainvoke(
        initial_state,
        config={
            "run_name": f"lit_review:{topic[:30]}",
            "tags": [f"quality:{quality}"],
            "metadata": {"quality_tier": quality},
        },
    )
 
    return standardize_result(result)

Quality Passthrough for Wrappers

Wrapper workflows pass quality through unchanged:

async def multi_lang_research(
    topic: str,
    workflow: Literal["web", "academic", "books"] = "web",
    quality: QualityTier = "standard",
) -> dict:
    """Multi-language research wrapper."""
    if workflow == "academic":
        result = await academic_lit_review(
            topic=topic,
            quality=quality,  # pass through unchanged
        )
    elif workflow == "web":
        result = await deep_research(
            query=topic,
            quality=quality,  # pass through unchanged
        )
 
    return result

The wrapper doesn’t reinterpret quality. It delegates to the underlying workflow, which interprets the tier using its own presets.

Standardized Return Structure

All workflows return consistent structure:

{
    "final_report": str,                              # main output
    "status": Literal["success", "partial", "failed"], # status enum
    "langsmith_run_id": str,                          # tracing ID
    "errors": list[dict],                             # error log
    "source_count": int,                              # resources used
    "started_at": datetime,                           # start time
    "completed_at": datetime,                         # end time
}

Design Decisions

Why Named Tiers Instead of Numeric?

# numeric tiers lack semantic meaning
quality: int = 3  # what does "3" mean?
 
# named tiers are self-documenting
quality: QualityTier = "standard"  # clear expectation

Named tiers with semantic meaning—test, quick, standard—communicate intent. Users immediately understand that “comprehensive” means thorough processing.

Why Default to Standard?

The “standard” tier balances quality and speed for most use cases. Users can opt up—comprehensive, high_quality—for important tasks or opt down—quick, test—for exploration or CI/CD.

Why Workflow-Specific Presets?

Different workflows have fundamentally different quality dimensions. Academic research cares about paper count and citation threshold. Web research cares about iteration depth. A unified preset dictionary would require either a bloated config with all possible fields or complex inheritance.

Observability Integration

Quality tier is captured in LangSmith for cost analysis:

config = {
    "run_name": f"lit_review:{topic[:30]}",
    "tags": [f"quality:{quality}"],
    "metadata": {"quality_tier": quality},
}

This enables filtering traces by quality tier and analyzing cost per tier.

Trade-offs

Benefits:

  • Clear semantics where “high_quality” conveys intent better than max_papers=300
  • Consistent API with all workflows using the same quality parameter
  • Easy orchestration where wrapper workflows pass quality through transparently
  • Observability with quality tier visible in LangSmith tags and metadata
  • Testability where the test tier enables fast CI/CD runs

Costs:

  • Preset rigidity where users can’t easily customize individual parameters
  • Workflow maintenance where each workflow must maintain its preset table
  • Time estimates are approximate because actual duration depends on content and network