Blog Post Generator with Vector Search
An AI-powered blog post generator that uses LanceDB for semantic similarity search to find the most relevant existing posts as context for generating new, high-quality blog content.
Features
- Semantic Similarity Search: Uses LanceDB and sentence transformers to find the 10 most semantically similar blog posts
- Style Analysis: Analyzes existing posts for tone, structure, and writing patterns
- Iterative Improvement: Grades content and iteratively improves until A- grade or better
- Smart Context: Combines category-based and semantic search for comprehensive context
- Auto-generated Slugs: Creates SEO-friendly URL slugs
- Dual AI Support: Works with both OpenAI and Ollama
Installation
# Using uv (recommended)
uv sync
# Or using pip
pip install -r requirements.txt
Usage
Basic Usage
python blog_post_generator.py \
--source_file content.txt \
--prompt "Your blog post idea" \
--output_file my-post.md
With Vector Search (Default)
python blog_post_generator.py \
--source_file transcript.txt \
--prompt "AI trends in venture capital" \
--categories ai startups funding
Disable Vector Search (Category-only)
python blog_post_generator.py \
--source_file content.txt \
--prompt "Blog post idea" \
--no_vector_search
Using Ollama (Local AI)
python blog_post_generator.py \
--source_file content.txt \
--prompt "Blog post idea" \
--use_ollama \
--ollama_model gemma2
How It Works
- Vector Database: On first run, creates embeddings for all blog posts using sentence-transformers
- Semantic Search: Finds 10 most semantically similar posts to your prompt
- Category Matching: Also finds posts matching specified categories
- Style Analysis: Analyzes writing patterns from both sets of posts
- Content Generation: Creates blog post using combined context
- Iterative Improvement: Grades and refines content until high quality
Vector Search Output
When running with vector search enabled, you’ll see:
🔄 Building vector database from blog posts...
📊 Creating embeddings for 1683 documents...
✅ Vector database created with 1683 documents
🔍 Analyzing existing posts for style...
Found 5 category-relevant posts for style analysis
🧠 Finding semantically similar posts...
Found 10 semantically similar posts
1. ai-investment-2024.md
2. vc-market-ai-2024.md
3. fundraising-compendium-guide.md
4. the-venture-fund-of-the-future.md
5. startup-ecosystem-trends.md
...and 5 more
📊 Using 15 total posts for style analysis
Arguments
--source_file
: Path to source content file (required)--prompt
: Blog post idea/prompt (required)--output_file
: Output markdown file (auto-generated if not provided)--categories
: Categories for style analysis (default: crypto, web3, data analysis)--content_dir
: Directory containing blog posts (default: ./content/post)--no_vector_search
: Disable semantic search, use only categories--use_ollama
: Use Ollama instead of OpenAI--ollama_model
: Ollama model name (default: gemma2)
Dependencies
- OpenAI API key (set
OPENAI_API_KEY
environment variable) - LanceDB for vector storage
- sentence-transformers for embeddings
- Standard Python packages: pandas, numpy, requests
Output Quality
The system aims for A- grade (90+ score) blog posts with:
- Compelling hooks for broad audiences
- Clear argument development
- Specific data and examples
- Professional but engaging tone
- Strong conclusions that tie back to opening
Example output grades:
📝 Grading attempt 1...
Grade: A (93/100)
✅ Target grade achieved: A
Final Grade: A- (91/100)