Literature Review

Produces academic literature reviews by discovering papers through keyword search and citation network traversal, then clustering and synthesizing findings into a structured report.

The workflow:

  1. Discovers papers via keyword search and citation network traversal
  2. Expands the corpus through iterative citation diffusion (following references)
  3. Acquires and processes PDFs into readable markdown
  4. Clusters papers by theme using both statistical and LLM-based methods
  5. Writes a synthesis organized by thematic clusters

When to Use This

This workflow is intended for academic-style literature reviews—surveying scholarly work on a research topic with proper citations.

For general web research, use Web Research. For book recommendations, use Book Discovery.

How It Works

flowchart TD
    subgraph discovery["1. Discovery"]
        A[Your Topic + Questions] --> B[Keyword Search]
        B --> C[Find Initial Papers]
        C --> D[Traverse Citation Networks]
    end

    subgraph diffusion["2. Citation Diffusion"]
        D --> E[Follow References & Citations]
        E --> F{Saturation Reached?}
        F -->|No| E
        F -->|Yes| G[Corpus Complete]
    end

    subgraph processing["3. Paper Processing"]
        G --> H[Acquire PDFs]
        H --> I[Convert to Markdown]
        I --> J[Generate Summaries]
    end

    subgraph clustering["4. Thematic Clustering"]
        J --> K[Statistical Clustering]
        K --> L[LLM-Based Clustering]
        L --> M[Synthesize Cluster Definitions]
    end

    subgraph synthesis["5. Synthesis"]
        M --> N[Write Sections by Theme]
        N --> O[Integrate Citations]
        O --> P[Quality Verification]
    end

    P --> Q[Your Literature Review]

    style discovery fill:#e8f4f8
    style diffusion fill:#f0f8e8
    style processing fill:#fff8e8
    style clustering fill:#f8e8f4
    style synthesis fill:#e8f0f8

The Steps Explained

1. Discovery Starts with keyword searches based on your topic and research questions. Finds an initial set of papers, then begins traversing their citation networks (papers they cite, papers that cite them).

2. Citation Diffusion Iteratively expands the corpus by following citation links. Each “stage” goes one level deeper into the citation network. Continues until saturation—when new papers stop adding novel information.

3. Paper Processing Acquires PDFs through multiple sources (open access, institutional access, preprint servers). Converts to markdown and generates summaries for each paper.

4. Thematic Clustering Groups papers by theme using two methods: BERTopic (statistical) and LLM-based analysis. A synthesis step reconciles these into coherent thematic clusters.

5. Synthesis Writes the review organized by thematic clusters. Each section draws from relevant papers, integrates citations properly, and undergoes quality verification.

Inputs

InputDescriptionExample
TopicResearch area to review”Transformer architectures in NLP”
Research questionsSpecific questions to address[“How have attention mechanisms evolved?”, “What are the scaling limits?“]
ThoroughnessHow deep to go (see below)“Standard” for most cases
Language (optional)Primary language for sources”en” for English
Date range (optional)Limit to papers from this period(2020, 2024)

Outputs

OutputDescription
ReportStructured literature review with thematic sections and citations
Paper corpusMetadata for all papers discovered
Thematic clustersPapers grouped by theme
PRISMA documentationMethodology documentation for reproducibility

Thoroughness Settings

SettingPapersDiffusion StagesIntended Use
Test~51Testing the system
Quick~502Quick survey
Standard~1003General reviews
Comprehensive~2004Thorough coverage
High Quality~3005Publication-quality work

Example

Input:

  • Topic: “Large language model alignment techniques”
  • Research questions: [“What methods exist for aligning LLMs with human preferences?”, “How is alignment evaluated?“]
  • Thoroughness: Standard

Typical output: A ~10,000 word review covering themes like RLHF approaches, constitutional AI, evaluation benchmarks, and safety considerations. Cites 80-100 papers organized into 4-6 thematic sections.


Developer Reference

Entry point: workflows/research/academic_lit_review/graph/api.py — exposes academic_lit_review() function

Graph construction: workflows/research/academic_lit_review/graph/construction.py — defines the 5-phase linear flow

State: workflows/research/academic_lit_review/state.pyAcademicLitReviewState

Phase nodes: workflows/research/academic_lit_review/graph/phases/

  • discovery.py — keyword search + initial citation traversal
  • diffusion.py — iterative citation expansion
  • processing.py — PDF acquisition and summarization
  • clustering.py — BERTopic + LLM clustering
  • synthesis.py — report writing

Subgraphs:

  • diffusion_engine/ — manages iterative citation expansion with saturation detection
  • paper_processor/ — PDF acquisition, conversion, extraction
  • clustering/ — dual clustering approach (statistical + LLM)
  • synthesis/ — section writing, citation integration, quality checks

Key utilities:

  • keyword_search/ — academic database query generation
  • citation_network/ — graph traversal and scoring
  • citation_graph/ — citation relationship management
  • utils/relevance_scoring/ — paper relevance assessment

Config: workflows/research/academic_lit_review/quality_presets.py — thoroughness tier definitions