QSearch

System Overview

QSearch is a production-grade demonstration of Retrieval-Augmented Generation (RAG) and semantic search capabilities, built on the complete Star Trek transcript archive spanning 60 years of television.

Priority One

Executive Overview

QSearch is a full-stack AI search and chat application that demonstrates enterprise-grade patterns for integrating large language models with structured data. The system combines vector similarity search, hybrid retrieval strategies, and streaming chat interfaces to enable natural language exploration of a corpus containing over 900 episodes across 13 Star Trek series.

This application serves as a reference implementation for organizations looking to build LLM-powered search experiences over their own proprietary data. Key patterns demonstrated include: intelligent query routing, multi-stage retrieval with reranking, embedding caching for cost optimization, and real-time observability.

909
Episodes Indexed
11,780
Searchable Chunks
21,563
Extracted Entities

Data Coverage

Series Coverage (11 Series)

TOSThe Original Series
1966-1969
TASThe Animated Series
1973-1974
TNGThe Next Generation
1987-1994
DS9Deep Space Nine
1993-1999
VOYVoyager
1995-2001
ENTEnterprise
2001-2005
DISDiscovery
2017-present
PICPicard
2020-2023
LDLower Decks
2020-present
PROProdigy
2021-present
SNWStrange New Worlds
2022-present

Entity Extraction

8,594
Characters
6,564
Locations
4,692
Species
1,713
Starships

Top Characters

1
Worf268
2
Picard184
3
Riker182
4
Data176
5
Sisko173

System Architecture

Frontend

  • Next.js 14 (App Router)
  • React 18 + TypeScript
  • Tailwind CSS + LCARS Theme
  • Vercel AI SDK (Streaming)

Backend

  • Next.js API Routes
  • Supabase (PostgreSQL)
  • pgvector Extension
  • HNSW + GIN Indexes

AI/ML

  • OpenAI Embeddings (3072d)
  • GPT-4o-mini (Routing)
  • Claude (Chat/RAG)
  • LLM-Based Reranking

Request Flow

User Query
Classification
Embedding
Hybrid Retrieval
Reranking
Results

Search Capabilities

Semantic Search

Vector similarity search using OpenAI's text-embedding-3-large model with 3072-dimensional embeddings.

  • Episode summary embeddings
  • Transcript chunk embeddings
  • Cosine similarity matching
  • HNSW indexes

Keyword Search

PostgreSQL full-text search with tsvector indexing for exact phrase matching.

  • GIN indexes on content
  • Character/location filtering
  • Wildcard matching
  • Proper noun optimization

Hybrid Search (RRF)

Reciprocal Rank Fusion combines semantic and keyword results with query-type dependent weighting.

  • Dynamic weight allocation
  • Result deduplication
  • Per-query-type optimization
  • Excerpt aggregation

Query Routing

LLM-powered query classification dynamically selects optimal retrieval strategies.

  • 5 query types supported
  • Confidence-based fallback
  • LRU cache (1000 entries)
  • Automatic strategy selection

Advanced Retrieval

🎯

LLM-Based Reranking

Two-stage retrieval fetches candidates then reranks using GPT-4o-mini relevance scoring.

💭

HyDE

Generates hypothetical transcript excerpts to embed, improving recall for abstract concept searches.

📍

Contextual Enrichment

Each chunk is prefixed with episode metadata before embedding for improved semantic matching.

🎬

Scene-Level Chunks

Extended chunks capture full scene context for arc and theme queries.

👥

Entity Extraction

Per-chunk character and location arrays enable filtered search by speaker or setting.

Two-Phase Retrieval

Episode-first search narrows scope, then chunk search within matched episodes for faster results.

Chat & RAG Pipeline

The chat interface demonstrates a complete RAG pipeline using Claude for response generation with real-time streaming.

Pipeline Sequence

  1. 1Extract latest user message for retrieval
  2. 2Route query through classification system
  3. 3Execute optimal retrieval strategy
  4. 4Fetch episode metadata and transcript chunks
  5. 5Build context string for LLM injection
  6. 6Stream response with debug metadata

Features

  • Real-time streaming via Vercel AI SDK
  • Debug panel showing retrieved context
  • Query routing metadata visualization
  • Episode-specific system prompts
  • Markdown rendering for responses
  • Conversation history management

Production Hardening

Caching Strategy

  • Embedding LRU cache (1h TTL)
  • Query classification cache (24h)
  • Automatic cache invalidation

Rate Limiting

  • 30 req/min per IP
  • 100 req/min global
  • Sliding window algorithm

Timeout Handling

  • 3s timeout on retrievers
  • Partial results on timeout
  • Graceful degradation

Observability

  • Latency percentiles (p50/p95/p99)
  • Volume by search type
  • Error rate tracking

Slow Query Logging

  • Configurable threshold (2s)
  • Query and timing capture
  • Performance debugging

Diagnostics

  • /api/diagnostics endpoint
  • Component health checks
  • Per-component timing

Access Terminal

Experience the power of semantic search and RAG-powered chat on the complete Star Trek transcript archive.