Shandu is a cutting-edge AI research assistant that performs in-depth, multi-source research on any topic using advanced language models, intelligent web scraping, and iterative exploration to generate comprehensive, well-structured reports with proper citations.
Shandu is an intelligent, LLM-powered research system that automates the comprehensive research process - from initial query clarification to in-depth content analysis and report generation. Built on LangGraph's state-based workflow, it recursively explores topics with sophisticated algorithms for source evaluation, content extraction, and knowledge synthesis.
- Academic Research: Generate literature reviews, background information, and complex topic analyses
- Market Intelligence: Analyze industry trends, competitor strategies, and market opportunities
- Content Creation: Produce well-researched articles, blog posts, and reports with proper citations
- Technology Exploration: Track emerging technologies, innovations, and technical developments
- Policy Analysis: Research regulations, compliance requirements, and policy implications
- Competitive Analysis: Compare products, services, and company strategies across industries
flowchart TB
subgraph Input
Q[User Query]
B[Breadth Parameter]
D[Depth Parameter]
end
DR[Deep Research] -->
SQ[SERP Queries] -->
PR[Process Results]
subgraph Results[Results]
direction TB
NL((Learnings))
ND((Directions))
end
PR --> NL
PR --> ND
DP{depth > 0?}
RD["Next Direction:
- Prior Goals
- New Questions
- Learnings"]
MR[Markdown Report]
%% Main Flow
Q & B & D --> DR
%% Results to Decision
NL & ND --> DP
%% Circular Flow
DP -->|Yes| RD
RD -->|New Context| DR
%% Final Output
DP -->|No| MR
%% Styling
classDef input fill:#7bed9f,stroke:#2ed573,color:black
classDef process fill:#70a1ff,stroke:#1e90ff,color:black
classDef recursive fill:#ffa502,stroke:#ff7f50,color:black
classDef output fill:#ff4757,stroke:#ff6b81,color:black
classDef results fill:#a8e6cf,stroke:#3b7a57,color:black
class Q,B,D input
class DR,SQ,PR process
class DP,RD recursive
class MR output
class NL,ND results
- Intelligent State-based Workflow: Leverages LangGraph for a structured, step-by-step research process with clear state transitions
- Iterative Deep Exploration: Recursively explores topics with dynamic depth and breadth, adapting to information discovered
- Multi-source Information Synthesis: Analyzes data from search engines, web content, and structured knowledge bases
- Enhanced Web Scraping: Features dynamic JS rendering, content extraction, and ethical scraping practices
- Smart Source Evaluation: Automatically assesses source credibility, relevance, and information value
- Content Analysis Pipeline: Uses advanced NLP to extract key information, identify patterns, and synthesize findings
- Comprehensive Report Generation: Creates detailed, well-structured reports with proper citations and thematic organization
- Parallel Processing Architecture: Implements concurrent operations for efficient multi-query execution
- Adaptive Search Strategy: Dynamically adjusts search queries based on discovered information and knowledge gaps
- Full Citation Management: Properly attributes all sources with formatted citations in multiple styles
# Install from PyPI
pip install shandu
# Install from source
git clone https://github.com/jolovicdev/shandu.git
cd shandu
pip install -e .
# Configure API settings (supports various LLM providers)
shandu configure
# Run comprehensive research
shandu research "Your research query" --depth 2 --breadth 4 --output report.md
# Quick AI-powered search with web scraping (You dont need Perplexity!)
shandu aisearch "Who is the current sitting president of United States?" --detailed
# Basic multi-engine search
shandu search "Your search query"
shandu research "Your research query" \
--depth 3 \ # How deep to explore (1-5, default: 2)
--breadth 5 \ # How many parallel queries (2-10, default: 4)
--output report.md \ # Save to file instead of terminal
--verbose # Show detailed progress
You can find example reports in the examples directory:
- The Intersection of Quantum Computing, Synthetic Biology, and Climate Modeling
shandu research "The Intersection of Quantum Computing, Synthetic Biology, and Climate Modeling" --depth 3 --breadth 3 --output examples/o3-mini-high.md
shandu aisearch "Your search query" \
--engines "google,duckduckgo" \ # Comma-separated list of search engines
--max-results 15 \ # Maximum number of results to return
--output results.md \ # Save to file instead of terminal
--detailed # Generate a detailed analysis
from shandu.agents import ResearchGraph
from langchain_openai import ChatOpenAI
# Initialize with custom LLM if desired
llm = ChatOpenAI(model="gpt-4")
# Initialize the research graph
researcher = ResearchGraph(
llm=llm,
temperature=0.5
)
# Perform deep research
results = researcher.research_sync(
query="Your research query",
depth=3, # How deep to go with recursive research
breadth=4, # How many parallel queries to explore
detail_level="high"
)
# Print or save results
print(results.to_markdown())
Shandu implements a sophisticated multi-stage research pipeline:
- Query Clarification: Interactive questions to understand research needs
- Research Planning: Strategic planning for comprehensive topic coverage
- Iterative Exploration:
- Smart query generation based on knowledge gaps
- Multi-engine search with parallelized execution
- Relevance filtering of search results
- Intelligent web scraping with content extraction
- Source credibility assessment
- Information analysis and synthesis
- Reflection on findings to identify gaps
- Report Generation:
- Theme extraction and organization
- Multi-step report enhancement
- Citation formatting and management
- Section expansion for comprehensive coverage
- Google Search
- DuckDuckGo
- Wikipedia
- ArXiv (academic papers)
- Custom search engines can be added
- Dynamic JS Rendering: Handles JavaScript-heavy websites
- Content Extraction: Identifies and extracts main content from web pages
- Parallel Processing: Concurrent execution of searches and scraping
- Caching: Efficient caching of search results and scraped content
- Rate Limiting: Respectful access to web resources
- Robots.txt Compliance: Ethical web scraping practices
- Flexible Output Formats: Markdown, JSON, plain text
Note: this is just an idea, i need contributors, this is project where I'm taking deep dive into LangChain and LangGraph! This project is licensed under the MIT License - see the LICENSE file for details.