Literature Review
Produces academic literature reviews by discovering papers through keyword search and citation network traversal, then clustering and synthesizing findings into a structured report.
The workflow:
- Discovers papers via keyword search and citation network traversal
- Expands the corpus through iterative citation diffusion (following references)
- Acquires and processes PDFs into readable markdown
- Clusters papers by theme using both statistical and LLM-based methods
- Writes a synthesis organized by thematic clusters
When to Use This
This workflow is intended for academic-style literature reviews—surveying scholarly work on a research topic with proper citations.
For general web research, use Web Research. For book recommendations, use Book Discovery.
How It Works
flowchart TD subgraph discovery["1. Discovery"] A[Your Topic + Questions] --> B[Keyword Search] B --> C[Find Initial Papers] C --> D[Traverse Citation Networks] end subgraph diffusion["2. Citation Diffusion"] D --> E[Follow References & Citations] E --> F{Saturation Reached?} F -->|No| E F -->|Yes| G[Corpus Complete] end subgraph processing["3. Paper Processing"] G --> H[Acquire PDFs] H --> I[Convert to Markdown] I --> J[Generate Summaries] end subgraph clustering["4. Thematic Clustering"] J --> K[Statistical Clustering] K --> L[LLM-Based Clustering] L --> M[Synthesize Cluster Definitions] end subgraph synthesis["5. Synthesis"] M --> N[Write Sections by Theme] N --> O[Integrate Citations] O --> P[Quality Verification] end P --> Q[Your Literature Review] style discovery fill:#e8f4f8 style diffusion fill:#f0f8e8 style processing fill:#fff8e8 style clustering fill:#f8e8f4 style synthesis fill:#e8f0f8
The Steps Explained
1. Discovery Starts with keyword searches based on your topic and research questions. Finds an initial set of papers, then begins traversing their citation networks (papers they cite, papers that cite them).
2. Citation Diffusion Iteratively expands the corpus by following citation links. Each “stage” goes one level deeper into the citation network. Continues until saturation—when new papers stop adding novel information.
3. Paper Processing Acquires PDFs through multiple sources (open access, institutional access, preprint servers). Converts to markdown and generates summaries for each paper.
4. Thematic Clustering Groups papers by theme using two methods: BERTopic (statistical) and LLM-based analysis. A synthesis step reconciles these into coherent thematic clusters.
5. Synthesis Writes the review organized by thematic clusters. Each section draws from relevant papers, integrates citations properly, and undergoes quality verification.
Inputs
| Input | Description | Example |
|---|---|---|
| Topic | Research area to review | ”Transformer architectures in NLP” |
| Research questions | Specific questions to address | [“How have attention mechanisms evolved?”, “What are the scaling limits?“] |
| Thoroughness | How deep to go (see below) | “Standard” for most cases |
| Language (optional) | Primary language for sources | ”en” for English |
| Date range (optional) | Limit to papers from this period | (2020, 2024) |
Outputs
| Output | Description |
|---|---|
| Report | Structured literature review with thematic sections and citations |
| Paper corpus | Metadata for all papers discovered |
| Thematic clusters | Papers grouped by theme |
| PRISMA documentation | Methodology documentation for reproducibility |
Thoroughness Settings
| Setting | Papers | Diffusion Stages | Intended Use |
|---|---|---|---|
| Test | ~5 | 1 | Testing the system |
| Quick | ~50 | 2 | Quick survey |
| Standard | ~100 | 3 | General reviews |
| Comprehensive | ~200 | 4 | Thorough coverage |
| High Quality | ~300 | 5 | Publication-quality work |
Example
Input:
- Topic: “Large language model alignment techniques”
- Research questions: [“What methods exist for aligning LLMs with human preferences?”, “How is alignment evaluated?“]
- Thoroughness: Standard
Typical output: A ~10,000 word review covering themes like RLHF approaches, constitutional AI, evaluation benchmarks, and safety considerations. Cites 80-100 papers organized into 4-6 thematic sections.
Developer Reference
Entry point: workflows/research/academic_lit_review/graph/api.py — exposes academic_lit_review() function
Graph construction: workflows/research/academic_lit_review/graph/construction.py — defines the 5-phase linear flow
State: workflows/research/academic_lit_review/state.py — AcademicLitReviewState
Phase nodes: workflows/research/academic_lit_review/graph/phases/
discovery.py— keyword search + initial citation traversaldiffusion.py— iterative citation expansionprocessing.py— PDF acquisition and summarizationclustering.py— BERTopic + LLM clusteringsynthesis.py— report writing
Subgraphs:
diffusion_engine/— manages iterative citation expansion with saturation detectionpaper_processor/— PDF acquisition, conversion, extractionclustering/— dual clustering approach (statistical + LLM)synthesis/— section writing, citation integration, quality checks
Key utilities:
keyword_search/— academic database query generationcitation_network/— graph traversal and scoringcitation_graph/— citation relationship managementutils/relevance_scoring/— paper relevance assessment
Config: workflows/research/academic_lit_review/quality_presets.py — thoroughness tier definitions