Unified Quality Tier System for LLM Workflows
Multi-workflow systems need consistent quality configuration. Without standardization, different workflows use different tier names, users don’t know what max_iterations=4 means, and wrapper workflows can’t easily pass quality settings through.
This pattern provides a 5-tier named quality system with workflow-specific preset mappings and standardized API signatures.
The Problem
Quality configuration challenges in multi-workflow systems:
- Inconsistent naming: Different workflows use “fast” vs “quick” vs “rapid.”
- Parameter confusion: Users don’t know what
max_papers=150means in terms of results. - No semantic meaning: Numeric parameters don’t convey quality expectations.
- Cross-workflow orchestration: Wrapper workflows can’t easily pass quality settings.
- Scattered validation: Each workflow validates quality independently.
The Solution
Implement a unified quality system with:
- Single
QualityTiertype shared across all workflows - Descriptive tier names with clear time/scope expectations
- Workflow-specific preset mappings
- Standardized API signatures and return structures
Quality Tiers
from typing import Literal
QualityTier = Literal["test", "quick", "standard", "comprehensive", "high_quality"]
QUALITY_TIER_DESCRIPTIONS = {
"test": "Minimal processing for testing (~one min)",
"quick": "Fast results with limited depth (~five min)",
"standard": "Balanced quality and speed (~15 min)",
"comprehensive": "Thorough processing (~30 min)",
"high_quality": "Maximum depth and quality (45+ min)",
}| Tier | Use Case | Typical Duration |
|---|---|---|
| test | CI/CD, quick validation | ~one min |
| quick | Interactive exploration, demos | ~five min |
| standard | Production use (default) | ~15 min |
| comprehensive | Important research tasks | ~30 min |
| high_quality | Publication-quality output | 45+ min |
Workflow-Specific Presets
Each workflow defines what quality means for its domain:
from typing import TypedDict
class AcademicQualitySettings(TypedDict):
max_stages: int # diffusion engine stages
max_papers: int # papers to process
target_word_count: int # final review length
min_citations_filter: int # citation threshold
saturation_threshold: float # coverage delta for termination
ACADEMIC_PRESETS: dict[QualityTier, AcademicQualitySettings] = {
"test": {
"max_stages": 1,
"max_papers": 5,
"target_word_count": 500,
"min_citations_filter": 0,
"saturation_threshold": 0.5,
},
"standard": {
"max_stages": 4,
"max_papers": 150,
"target_word_count": 6000,
"min_citations_filter": 5,
"saturation_threshold": 0.15,
},
"high_quality": {
"max_stages": 8,
"max_papers": 400,
"target_word_count": 15000,
"min_citations_filter": 10,
"saturation_threshold": 0.05,
},
}Different workflows have different quality dimensions:
- Academic: Papers count, citation threshold, word count
- Web research: Iterations, recursion depth
- Books: Recommendations per category, model choice
A unified mapping would be either too generic or too complex.
Standardized API Signatures
All workflow entry points use consistent parameter order:
async def academic_lit_review(
topic: str, # primary input
research_questions: list[str], # secondary input
quality: QualityTier = "standard", # quality tier
language: str = "en", # language
) -> dict[str, Any]:
# 1. validate quality tier
quality = validate_quality_tier(quality)
# 2. load preset settings
quality_settings = ACADEMIC_PRESETS[quality]
# 3. run workflow with quality observability
result = await graph.ainvoke(
initial_state,
config={
"run_name": f"lit_review:{topic[:30]}",
"tags": [f"quality:{quality}"],
"metadata": {"quality_tier": quality},
},
)
return standardize_result(result)Quality Passthrough for Wrappers
Wrapper workflows pass quality through unchanged:
async def multi_lang_research(
topic: str,
workflow: Literal["web", "academic", "books"] = "web",
quality: QualityTier = "standard",
) -> dict:
"""Multi-language research wrapper."""
if workflow == "academic":
result = await academic_lit_review(
topic=topic,
quality=quality, # pass through unchanged
)
elif workflow == "web":
result = await deep_research(
query=topic,
quality=quality, # pass through unchanged
)
return resultThe wrapper doesn’t reinterpret quality. It delegates to the underlying workflow, which interprets the tier using its own presets.
Standardized Return Structure
All workflows return consistent structure:
{
"final_report": str, # main output
"status": Literal["success", "partial", "failed"], # status enum
"langsmith_run_id": str, # tracing ID
"errors": list[dict], # error log
"source_count": int, # resources used
"started_at": datetime, # start time
"completed_at": datetime, # end time
}Design Decisions
Why Named Tiers Instead of Numeric?
# numeric tiers lack semantic meaning
quality: int = 3 # what does "3" mean?
# named tiers are self-documenting
quality: QualityTier = "standard" # clear expectationNamed tiers with semantic meaning—test, quick, standard—communicate intent. Users immediately understand that “comprehensive” means thorough processing.
Why Default to Standard?
The “standard” tier balances quality and speed for most use cases. Users can opt up—comprehensive, high_quality—for important tasks or opt down—quick, test—for exploration or CI/CD.
Why Workflow-Specific Presets?
Different workflows have fundamentally different quality dimensions. Academic research cares about paper count and citation threshold. Web research cares about iteration depth. A unified preset dictionary would require either a bloated config with all possible fields or complex inheritance.
Observability Integration
Quality tier is captured in LangSmith for cost analysis:
config = {
"run_name": f"lit_review:{topic[:30]}",
"tags": [f"quality:{quality}"],
"metadata": {"quality_tier": quality},
}This enables filtering traces by quality tier and analyzing cost per tier.
Trade-offs
Benefits:
- Clear semantics where “high_quality” conveys intent better than
max_papers=300 - Consistent API with all workflows using the same quality parameter
- Easy orchestration where wrapper workflows pass quality through transparently
- Observability with quality tier visible in LangSmith tags and metadata
- Testability where the test tier enables fast CI/CD runs
Costs:
- Preset rigidity where users can’t easily customize individual parameters
- Workflow maintenance where each workflow must maintain its preset table
- Time estimates are approximate because actual duration depends on content and network