When we set out to build AI-powered search for The SEO Community archive, we knew it sounded straightforward: hook up an LLM to our message database and let it answer questions. But underneath we knew there'd be much much more to it.
What followed was three days of rapid iteration, testing, and learning. We tried six different approaches before landing on something that actually works well. Here's what we learned.
The problem: Much gold, but disparate
The archive has over 18,500 messages across dozens of channels. Traditional keyword search works fine when you know exactly what you're looking for, like "Screaming Frog" or a specific URL. But what about questions like:
Keywords alone can't capture the intent behind that question. We needed semantic understanding, not just string matching.
But here's the real challenge we didn't anticipate: consistency. Users asking essentially the same question with different words should get the same answer. "Should I use llms.txt?" and "Should I implement llms.txt?" are the same question. Our search needed to understand that.
The Journey: Six approaches in three days
Day 1, Morning
Day 1, Evening
Day 2, Morning
Day 2, Afternoon
Day 2, Evening
Approach 1: Algolia Keyword Search + Gemini
The Idea Worked, But Inconsistent
Use Algolia to find the top 20 messages matching the query keywords, then pass them to Gemini 2.5 Flash to generate a summarized answer with citations.
How It Worked
The Problem
Keyword search is literal. When a user asked about "GSC," Algolia looked for messages containing the letters "GSC," not messages discussing Google Search Console. Worse, semantically equivalent queries returned completely different results:
"should I use llms.txt?"
"should I implement llms.txt?"
The same question, phrased differently, got different answers. That's a terrible user experience.
Lesson Learned
Keyword search can't handle synonyms, abbreviations, or rephrased questions. For semantic queries, you need semantic search.
Approach 2: Stop Word Filtering
The Idea Made Things Worse
Strip common question words (what, how, should, about, etc.) before sending queries to Algolia. Keep only the "meaningful" keywords.
The Implementation
const STOP_WORDS = [
'what', 'how', 'should', 'about', 'do', 'does',
'can', 'could', 'would', 'is', 'are', 'the', 'a', 'an',
'we', 'i', 'you', 'they', 'know', 'think', 'use'
]
function extractKeywords(query) {
return query
.toLowerCase()
.split(/\s+/)
.filter(word => !STOP_WORDS.includes(word))
.join(' ')
}
The Problem
This was too aggressive. "What do we know about llms.txt?" became just "llms.txt," which sometimes helped but often lost important context. And it didn't solve the core problem: "use" and "implement" are different keywords that mean the same thing.
Worse, the stop word list kept growing. Every edge case needed a new word added. It was a game of whack-a-mole.
Lesson Learned
Rule-based text processing doesn't scale. You can't anticipate every way users will phrase questions. You need a smarter approach.
Approach 3: AI Entity Extraction
The Idea Better, But Expensive
Instead of a static stop word list, use Gemini to intelligently extract the core topic from each query. Let AI understand what the user is really asking about.
The Implementation
We added a fast "extraction" call before searching:
// First LLM call: Extract topics (fast, low tokens)
const extractionPrompt = `
Extract the core topic/entity from this question.
Return only the key terms, no explanation.
Question: "should we implement llms.txt?"
Answer: llms.txt
Question: "${userQuery}"
Answer:`
const searchTerms = await geminiFlash.generate(extractionPrompt)
// Then search Algolia with extracted terms
const results = await algolia.search(searchTerms)
What Worked
This handled synonyms better. "CWV" correctly extracted to "Core Web Vitals." Questions about "GSC" found messages about "Google Search Console."
What Didn't Work
Two problems: cost and latency. Every search now required two LLM calls instead of one. The extraction call was fast (~200ms), but it added up. And we still had inconsistent results because Algolia was still doing keyword matching on the extracted terms.
Before (Single LLM Call)
After (Two LLM Calls)
Lesson Learned
AI can understand intent better than rules, but adding LLM calls adds latency and cost. Each call should provide significant value.
Approach 4: Query Normalization
The Idea Overcomplicated
Normalize semantically equivalent queries to a canonical form. Cache results by normalized query so users asking the same thing get cache hits.
The Implementation
const normalizationPrompt = `
Normalize this question to its canonical form.
Remove personal pronouns, use present tense, standardize phrasing.
"should I use llms.txt?" → "using llms.txt"
"should we implement llms.txt?" → "using llms.txt"
"what is llms.txt?" → "llms.txt overview"
Question: "${userQuery}"
Normalized:`
The Problem
This added a third LLM call to each search. And the normalization itself was inconsistent, Gemini might normalize the same query differently on different calls. We were trying to use AI to create consistency, but AI is inherently probabilistic.
We were solving the wrong problem. The issue wasn't caching or normalization. The issue was that keyword search fundamentally can't do semantic matching.
Lesson Learned
Don't add complexity to work around a fundamental limitation. Fix the root cause instead. We were putting bandaids on keyword search when we needed to replace it entirely.
Approach 5: Multi-Query + Reciprocal Rank Fusion
The Idea 57% Consistency
Run multiple query variations in parallel, then merge results using Reciprocal Rank Fusion (RRF), a technique from information retrieval research.
How RRF Works
RRF combines results from multiple search queries by scoring each document based on its rank across all queries:
// For each document, sum: 1 / (k + rank) across all queries
// k is typically 60
RRF_score = Σ (1 / (60 + rank_in_query_i))
// Document ranked #1 in one query and #5 in another:
// Score = 1/61 + 1/65 = 0.0164 + 0.0154 = 0.0318
The Implementation
The Results
We built a test suite with 34 pairs of semantically equivalent queries and measured result overlap:
Better than before, but still not good enough. And we were now making 3 Algolia queries plus 1 LLM call for query generation per search. The complexity was getting out of hand.
Lesson Learned
You can improve keyword search with clever techniques, but you're still limited by the fundamental approach. Sometimes the answer is to change the approach entirely.
Approach 6: Vector Search + Query Expansion
The Final Solution 72% Consistency, Grade A
Replace keyword search entirely with vector embeddings. Convert every message to a 768-dimensional vector. Convert queries to vectors. Find messages with similar vectors.
The Key Insight
Vector embeddings capture meaning, not just words. When you embed "should I use llms.txt?" and "should I implement llms.txt?", they produce nearly identical vectors because they mean the same thing.
The Implementation
One-Time Setup
-
All 2,070 messagesArchive message content
-
text-embedding-004Generate 768-dim vectors
-
Store in FirestoreVector field on each document
Per Query
-
User QueryExpand SEO terms → Generate embedding
-
Firestore Vector Search
findNearest(COSINE) -
Top 100 MessagesMost semantically similar
-
GeminiSummary with citations
Query Expansion for SEO Terms
Vector search handles synonyms well, but abbreviations are trickier. "GSC" and "Google Search Console" have different embeddings because they're different strings. So we added a simple expansion layer:
const QUERY_EXPANSIONS = {
'GSC': 'Google Search Console',
'GA4': 'Google Analytics 4',
'SF': 'Screaming Frog crawler',
'SGE': 'Search Generative Experience AI Overview',
'CWV': 'Core Web Vitals',
'E-E-A-T': 'Experience Expertise Authoritativeness Trustworthiness',
// ... 70+ mappings
}
// "What's happening in GSC?" becomes
// "What's happening in GSC? Google Search Console"
The original query stays intact (preserving intent), but we append the expanded terms so the embedding captures both the abbreviation and full name.
The Results
Here's how specific test cases improved:
| Query Pair | Before | After |
|---|---|---|
| "GSC performance" vs "Google Search Console performance" | 52% | 82% |
| "AI search optimization" vs "SGE optimization" | 16% | 80% |
| "should I use llms.txt" vs "should I implement llms.txt" | 48% | 91% |
| "how to build backlinks" vs "how to create backlinks" | 55% | 88% |
The Final Architecture
Cost Breakdown
One surprise: vector search ended up being cheaper than our earlier approaches.
| Component | Cost | Notes |
|---|---|---|
| Initial embedding generation | ~$0.03 | One-time for 2,070 messages |
| Query embedding | ~$0.0001 | Per query |
| Firestore vector search | Free | Included in Firestore reads |
| Gemini 2.5 Flash response | ~$0.003 | Per query |
| Total per query | ~$0.003 |
Compare this to Approach 4 (normalization + extraction + response) which cost ~$0.01 per query and still had worse results.
Key Takeaways
1. Semantic search needs semantic technology
We spent two days trying to make keyword search behave semantically. It can't. Vector embeddings solve this at a fundamental level.
2. Measure what matters
Our test suite with 34 query pairs and overlap measurement was invaluable. Without it, we'd still be guessing whether changes helped.
3. Simpler is often better
Our final solution has fewer moving parts than approaches 3, 4, and 5. One embedding call, one vector search, one LLM call. That's it.
4. Domain-specific knowledge still matters
Vector search handles synonyms, but not abbreviations. The query expansion map for SEO terms boosted our score from 67% to 72%. Small additions, big impact.
5. Iterate fast, test everything
We built a test endpoint that let us curl queries and compare results in real-time. This rapid iteration loop was essential for the 6 approaches in 3 days. This just wasn't possible for me 2 or more years ago. I got to experiment with several approaches and learn quickly.
6. Keep learnings in a Markdown file as you work
After I finished this project I came up with an idea I will use from now on. In an environment where I'm testing several methods and want to carry those learnings forward I need to have Claude write the learnings from each iteration in a learnings.md file so that we can have a shared context as we build. This will help both Claude and more importantly, me, remember how I got to different outcomes.
What's next
The current system works well, but there's always room to improve:
- Hybrid search: Combine vector search with keyword matching for queries that include specific names or URLs
- Conversation context: Use previous questions to inform follow-up queries in the same session
- Source diversity: Ensure answers cite messages from multiple channels and authors, not just the most semantically similar
- Feedback loop: Track which citations users click to understand what's actually helpful
Building AI features is an iterative process. You start with assumptions, hit reality, adjust, and repeat. The key is having good metrics and being willing to throw away approaches that don't work, even if you spent time on them.
Try it yourself
- AI search is live in the The SEO Community archive.
- Log in with your Slack credentials.
- Toggle to "Ask AI" mode and try some questions.
- See if it finds what you're looking for.
- And if it doesn't? Let us know in #feedback.
- That's how we'll make it better.