Vector Databases Explained: The Infrastructure Behind AI Search

What Problem Do Vector Databases Solve?

Traditional databases excel at exact matching: find all records where customer_id equals 12345. But AI applications need something different. They need similarity search, where you find content that means the same thing even if it uses different words.

Consider searching a knowledge base for "how to reset my password". A traditional database would look for documents containing exactly those words, missing helpful content that uses different terminology. A vector database finds documents about password recovery, account access, credential issues, and login help, even if they never mention "reset" or "password" explicitly.

This semantic understanding is what makes AI search actually useful. It's built on vector embeddings stored in specialised databases optimised for similarity queries rather than exact matches.

Understanding Vectors and Embeddings

Before diving into databases, it helps to understand what they store.

A vector is simply a list of numbers. In AI applications, we use vectors with hundreds or thousands of dimensions, such as a list like [0.12, -0.34, 0.56, ...] continuing for 384, 768, 1536 or more numbers. Each position in the vector captures some aspect of meaning, and similar concepts end up with similar numbers, which means their vectors are close together in the mathematical space.

An embedding is a vector that represents something: text, an image, audio, or other content. Embedding models convert content into these numerical representations. The sentence "The quick brown fox" becomes a specific list of numbers. The semantically similar phrase "A fast russet-coloured canine" becomes a very similar list of numbers, close in the mathematical space. The completely different phrase "Quantum physics equations" becomes a very different list of numbers, far away in the mathematical space. The magic of embeddings is that semantic similarity translates directly to vector similarity: close meanings produce close vectors.

Vector databases find similar vectors using distance metrics. Cosine similarity measures the angle between vectors and is most common for text. Euclidean distance measures straight-line distance between vector points. Dot product provides fast similarity calculation. The choice of metric depends on your embedding model and use case, but cosine similarity is the safe default for most text applications.

How Vector Databases Work

Vector databases are optimised for a specific operation: find the K most similar vectors to a query vector. This is called K-nearest neighbours (KNN) search.

The challenge is that comparing a query vector against every stored vector (brute force search) doesn't scale. A million vectors with 1536 dimensions means billions of calculations per query. Response times become unacceptable for interactive applications. Costs scale linearly with data size, making large collections prohibitively expensive.

The solution is approximate nearest neighbours (ANN) algorithms that find approximately the most similar vectors much faster. HNSW (Hierarchical Navigable Small World) builds graph-based indexes that enable fast search by navigating through connected nodes. IVF (Inverted File Index) clusters vectors into buckets, so search only checks relevant clusters instead of everything. PQ (Product Quantisation) compresses vectors to reduce memory usage while maintaining search quality. These techniques trade a small accuracy reduction for dramatic speed improvements, often 100x faster while maintaining 99%+ recall of the true nearest neighbours.

Vector Database Options

The market has exploded with options. Understanding the categories helps you navigate the choices.

Managed services handle infrastructure for you. Pinecone pioneered the space and offers a fully managed service with a simple API that scales well. Weaviate Cloud provides the managed version of open-source Weaviate with strong hybrid search capabilities. Qdrant Cloud offers managed Qdrant with good price-to-performance ratio. These services are ideal when you want to focus on building applications rather than operating databases.

Self-hosted open source options let you run your own infrastructure. Weaviate is feature-rich with a strong community and good documentation. Qdrant is Rust-based with excellent performance and a growing ecosystem. Milvus is a mature project with enterprise features but more complex to operate. Chroma is simple and Python-native, making it good for development and smaller deployments.

Vector extensions for existing databases add vector capabilities to databases you already use. pgvector brings vector search to PostgreSQL. Elasticsearch has added vector search capabilities. Redis offers a vector search module. MongoDB Atlas provides vector search. These extensions are excellent for smaller scale or when you want to minimise infrastructure complexity by keeping everything in one database.

Choosing the Right Solution

The right choice depends on several factors.

Scale is often the primary driver. Under 100,000 vectors, pgvector or any option works fine, so pick based on convenience. Between 100,000 and 10 million vectors, most options work well, so consider query volume and latency requirements. Above 10 million vectors, dedicated vector databases are recommended because extensions struggle with performance at this scale.

Query patterns affect which solutions work best. Simple semantic search works with any option. Filtered search that combines vector similarity with metadata conditions requires checking how each option handles filtering, as performance varies significantly. Hybrid search combining vector similarity with traditional keyword matching is a strength of Weaviate and Elasticsearch.

Operational considerations matter for real deployments. Team expertise affects whether managed services make sense if database operations isn't your strength. Data residency requirements may force self-hosting if you need control over where data lives. Cost calculations compare open source plus cloud compute against managed service pricing. Existing infrastructure may favour extensions if you want to minimise the number of systems you operate.

Key Concepts for Implementation

Several concepts matter when building with vector databases.

Index configuration determines search speed and accuracy. Choose a distance metric that matches your embedding model; the model documentation usually specifies this. Configure the index based on your latency versus accuracy trade-off. Understand that index choice affects insert speed, not just query speed, since some indexes are faster to build than others.

Metadata and filtering are essential because vectors alone aren't enough for most applications. Store source document IDs so you can retrieve the original content. Include categories, dates, permissions, and other attributes needed for filtering. Understand whether your database filters before vector search (pre-filtering) or after (post-filtering) because this significantly affects performance and result quality.

Namespaces and multi-tenancy matter if you're serving multiple customers or use cases. Use namespaces or collections to isolate data between tenants. Consider performance implications of shared indexes versus separate indexes per tenant. Plan for data isolation requirements based on your security and privacy needs.

Embedding model alignment requires consistency throughout your system. Configure the database for your model's dimensions (384, 768, 1536, or whatever your model produces). Use the same model for both indexing documents and embedding queries, because different models produce incompatible vectors. Plan for potential model changes, which may require re-embedding all your content.

Common Patterns

RAG (Retrieval-Augmented Generation) is the most common use case for vector databases. The pattern involves chunking documents into smaller pieces, embedding each chunk and storing it in the vector database, embedding user queries and searching for similar chunks, then passing retrieved chunks to an LLM as context for generating responses. This enables AI systems to work with your specific content rather than just general knowledge.

Semantic search provides search that understands meaning rather than just matching keywords. Embed all searchable content in your knowledge base. When users query, embed their query and return semantically similar results. This is often combined with keyword search in hybrid approaches that capture both exact matches and semantic similarity.

Recommendations use vector similarity to find related items. Embed item descriptions, features, or user behaviour patterns. Find items similar to what a user likes or is currently viewing. This can be combined with collaborative filtering for more sophisticated recommendation systems.

Deduplication finds near-duplicate content that wouldn't match with exact comparison. Embed content and search for highly similar vectors. Flag or merge items that are semantically equivalent even if they differ in wording or formatting.

Performance Optimisation

Several techniques improve production performance.

Embedding generation is often the bottleneck. Batch embedding requests to reduce per-item latency overhead. Cache frequently-queried embeddings rather than regenerating them. Consider local embedding models for high-volume use cases where API latency and costs become significant.

Query optimisation reduces search latency. Retrieve only the top-K results you actually need rather than more than necessary. Use metadata filters to narrow the search space before vector comparison. Understand the trade-offs in approximate search configuration, since tighter approximations are slower but more accurate.

Index tuning balances build time and query performance. Monitor recall to ensure search quality meets your requirements. Rebuild indexes as data grows significantly, because index quality can degrade with many incremental additions. Balance the cost of index building against query performance requirements.

Getting Started

A practical path to begin with vector databases: start simple with pgvector or Chroma for experimentation, which have the lowest setup overhead. Choose an embedding model (OpenAI's text-embedding-3-small is a safe default with good quality and reasonable cost). Build a prototype to get something working before optimising. Measure baseline performance to understand your latency and accuracy starting point. Then optimise as needed, moving to dedicated solutions if your requirements exceed what simpler options can provide.