System Overview
QSearch is a production-grade demonstration of Retrieval-Augmented Generation (RAG) and semantic search capabilities, built on the complete Star Trek transcript archive spanning 60 years of television.
Executive Overview
QSearch is a full-stack AI search and chat application that demonstrates enterprise-grade patterns for integrating large language models with structured data. The system combines vector similarity search, hybrid retrieval strategies, and streaming chat interfaces to enable natural language exploration of a corpus containing over 900 episodes across 13 Star Trek series.
This application serves as a reference implementation for organizations looking to build LLM-powered search experiences over their own proprietary data. Key patterns demonstrated include: intelligent query routing, multi-stage retrieval with reranking, embedding caching for cost optimization, and real-time observability.
Data Coverage
Series Coverage (11 Series)
Entity Extraction
Top Characters
System Architecture
Frontend
- Next.js 14 (App Router)
- React 18 + TypeScript
- Tailwind CSS + LCARS Theme
- Vercel AI SDK (Streaming)
Backend
- Next.js API Routes
- Supabase (PostgreSQL)
- pgvector Extension
- HNSW + GIN Indexes
AI/ML
- OpenAI Embeddings (3072d)
- GPT-4o-mini (Routing)
- Claude (Chat/RAG)
- LLM-Based Reranking
Request Flow
Search Capabilities
Semantic Search
Vector similarity search using OpenAI's text-embedding-3-large model with 3072-dimensional embeddings.
- Episode summary embeddings
- Transcript chunk embeddings
- Cosine similarity matching
- HNSW indexes
Keyword Search
PostgreSQL full-text search with tsvector indexing for exact phrase matching.
- GIN indexes on content
- Character/location filtering
- Wildcard matching
- Proper noun optimization
Hybrid Search (RRF)
Reciprocal Rank Fusion combines semantic and keyword results with query-type dependent weighting.
- Dynamic weight allocation
- Result deduplication
- Per-query-type optimization
- Excerpt aggregation
Query Routing
LLM-powered query classification dynamically selects optimal retrieval strategies.
- 5 query types supported
- Confidence-based fallback
- LRU cache (1000 entries)
- Automatic strategy selection
Advanced Retrieval
LLM-Based Reranking
Two-stage retrieval fetches candidates then reranks using GPT-4o-mini relevance scoring.
HyDE
Generates hypothetical transcript excerpts to embed, improving recall for abstract concept searches.
Contextual Enrichment
Each chunk is prefixed with episode metadata before embedding for improved semantic matching.
Scene-Level Chunks
Extended chunks capture full scene context for arc and theme queries.
Entity Extraction
Per-chunk character and location arrays enable filtered search by speaker or setting.
Two-Phase Retrieval
Episode-first search narrows scope, then chunk search within matched episodes for faster results.
Chat & RAG Pipeline
The chat interface demonstrates a complete RAG pipeline using Claude for response generation with real-time streaming.
Pipeline Sequence
- 1Extract latest user message for retrieval
- 2Route query through classification system
- 3Execute optimal retrieval strategy
- 4Fetch episode metadata and transcript chunks
- 5Build context string for LLM injection
- 6Stream response with debug metadata
Features
- Real-time streaming via Vercel AI SDK
- Debug panel showing retrieved context
- Query routing metadata visualization
- Episode-specific system prompts
- Markdown rendering for responses
- Conversation history management
Production Hardening
Caching Strategy
- Embedding LRU cache (1h TTL)
- Query classification cache (24h)
- Automatic cache invalidation
Rate Limiting
- 30 req/min per IP
- 100 req/min global
- Sliding window algorithm
Timeout Handling
- 3s timeout on retrievers
- Partial results on timeout
- Graceful degradation
Observability
- Latency percentiles (p50/p95/p99)
- Volume by search type
- Error rate tracking
Slow Query Logging
- Configurable threshold (2s)
- Query and timing capture
- Performance debugging
Diagnostics
- /api/diagnostics endpoint
- Component health checks
- Per-component timing
Access Terminal
Experience the power of semantic search and RAG-powered chat on the complete Star Trek transcript archive.