Are you navigating the complex world of vector databases in Postgres? Explore PGVector and its indexes: Flat, HNSW, and IVFFlat. Dive into the nuances of accuracy, speed, and scalability to make informed decisions.
Making the right choices can often seem daunting in the vast world of vector databases. With an array of options available, understanding the nuances of vector databases is crucial to making an informed decision. If you want to read more about these nuances, you can read our previous blog post on vector databases.
If you happen to use Postgres as your vector database of choice, you might have noticed that you can utilize various indexes with it — Flat, HNSW, and IVFFlat. But what do these terms mean? How do they differ from each other, and most importantly, how do they impact your database's performance and efficiency?
This blog post aims to demystify these questions, providing you with a clear understanding of how they work, backed by simple, real-world comparisons.
Before we delve into the main topic, let's revisit some key concepts related to vector databases. If you already feel familiar with the concepts highlighted in bold in the subsequent section, you can skip directly to the flat index.
Vector databases are specialized systems designed to manage vector data types. They are not only equipped to store vectors, but they also undertake the task of querying and optimizing query operations, such as implementing vector data type indexes. Typically, vector databases are categorized into two types: native and non-native. Native vector databases are specifically engineered from the ground up to handle vectors. On the other hand, non-native vector databases are pre-existing databases that have been enhanced with extensions to accommodate vector data types.
The terms embedding and vector are frequently used interchangeably. The term embedding originates from the process through which vectors are created, in which we aim to embed the semantic essence of the input within a vector. This embedding process involves feeding data into an embedding model, which generates a vector of a predetermined dimension in response to the input data.
Executing a similarity search on a vector database involves searching for semantically most similar to our query vector. Even though embeddings encapsulate the semantic essence of the data, in order to calculate the similarity between two vectors, we need to introduce a function that takes vector values as input and outputs similarity. This function is fundamentally based on vector distance, which dictates how we measure the similarity between two vectors.
The concept of similarity search is pivotal because it enables us to retrieve the semantically most relevant data for any given input. This functionality is often associated with the RAG architectural pattern and is frequently implemented in LLM-based applications.
The flat index offers a simple solution without any approximation algorithms. How it works is pretty straightforward. In a flat index, we compare our query vector against every other vector in the index. Although this is the slowest method, it allows for the most accurate search.
In the flat index, when you retrieve k relevant vectors, you can be assured that these vectors are indeed the nearest neighbors to our query vector. We refer to this type of search as a k nearest neighbors (kNN) search.
Image 1 - k Nearest Neighbour Search. In the image, it can be seen that our query vector is checked against every other vector in the database. While this approach ensures we always find the nearest neighbours it can be slower on larger scales of data.
If quality and accuracy are of critical value, you should consider using a flax index. And if you are concerned about search speed in a flat index, unless you have a large enough dataset (numbers higher than a million vectors) — there is no noticeable slowdown.
When we start working with large volumes of data, the speed of the flat index begins to fall off. If we want to know the k nearest neighbors, there is no shortcut to comparing the query vector against every other vector in the database. That is why we must start introducing trade-offs to increase our search speed.
For our particular case, we will introduce the search speed and accuracy trade-off. We can do this by incorporating approximation into our kNN, resulting in an approximate nearest neighbor (ANN) search. The challenge lies in where to draw the line on the ratio.
This concept is based on the idea that we no longer search the entire database — but a reduced version. How depends on the implementation. There are techniques like dimensionality and search scope reduction, but we won't cover them as concepts in this article.
Image 2 - Approximate Nearest Neighbour Search. The image shows the consequences of approximation. With this approach, we may miss the actual nearest neighbors. Concretely, in our case, we missed two out of five of the nearest neighbors. However, the result is still satisfactory as all the found neighbors are still near enough to our query vector.
The important thing to remember with ANN is that we can no longer be sure that we have k nearest neighbors, as we have approximated them.
The Hierarchical Navigable Small World (HNSW) algorithm is a newer addition to the pgvector extension. HNSW is one of the most accurate ANN algorithms currently available and is known to work exceptionally well with high-dimensional vectors. As its name suggests, it uses the concept of navigable small world (NSW) graphs hierarchically — a multilayer graph solution.
To understand how the HNSW algorithm works, we will use the analogy of traveling to a different country for a vacation. At the start, the first thing to look up is the available flights. Air travel allows for fast travel between far-away locations. Once we are at the destination airport, we can't use the plane for further travel (unless we are Taylor Swift). There may have been no direct flight to our destination, so we took a train ride next. Finally, after arriving at the city of our destination, we can use public transit like trams and buses or call a taxi to get to our resort.
If we look at the starting airport as our entry point and the resort as our query vector, this is quite close to how HNSW really works. We have multiple layers where the opening layers allow for much faster traversal but offer a less dense network of travel destinations than ending layers.
Image 3 - Hierarchical Navigable Small World Algorithm. A detail we can notice in the image is that the algorithm missed the nearest neighbor. The node bottom right of our query vector is ever so slightly closer.
The basis of how HNSW works is that on each layer (plane, train, vehicle), we find the closest vector to the query vector (our resort). We then move to the found vector and progress to the next even denser layer. We repeat this logic for every layer.
NSW graphs are the secret that allows this algorithm to move fast over the vector space. We use them because they enable the algorithm to find the closest vector on that particular layer in just a few hops (think of it as good road infrastructure). However, there are approximations in this algorithm that come from the fact that the HNSW algorithm is greedy and can sometimes miss.
Image 4 - Greedy Algorithm. The image illustrates the possibility of missing the nearest neighbor. Starting from our entry point, the algorithm will always move to the vertex closest to the query vector. Following this logic leads us to the red vertex, which is known as the local minimum in mathematics. The path to reach the actual nearest neighbor is outlined on the second graph, a path missed by this algorithm. The actual nearest neighbor vertex is known as the global minimum.
Another benefit of the HNSW index is that the index is data agnostic. That means that the index performance doesn't depend on the data (vectors inside of the database) it's built upon. If you change the data in the database, it still performs as well as before. Once we construct the index, we only need to bear the cost of maintaining it.
One downside is that the HNSW index is very memory-intensive and running a database with it will require enough RAM to function normally. If we're unable to provide it with enough memory, some other solution may be more suitable.
The IVFFlat (inverted file with flat compression) index, while slower and less accurate than the HNSW index, offers a decent scalable solution. Its scalability comes from its low memory usage and fast build time. The main strategy of IVFFlat is to reduce the search scope through clustering.
Let's take a look at another example, this time with cities. We know that cities are divided into districts. If we wanted to find the nearest gym, a reasonable restriction would be to search only inside the same district where our home is first. This is generally a good rule, but if, for example, we are located near the edge of a district, the closest gym could be in a neighboring district. With this in mind, we should extend the search to a few of the nearest districts.
Image 5 - Inverted File With Flat Compression Algorithm. Unlike cities, where districts can vary based on complex interactions, the clusters in the vector space are achieved through a k-means clustering technique. The end result is vector space partitioned into Voronoi cells.
That is essentially how IVFFlat works; we divide the search space (city) into clusters (city districts). Then, when we search for an ANN, we find which cluster our query vector (our home) belongs to, and then we search for the nearest vectors (gyms) in that same cluster and a few of the closest clusters.
The downside of IVFFlat is that that index is non-data-agnostic, which means the index performance is directly tied to the data it was built upon. With this in mind, we should only generate the index after we have all the data in our database. That is all it takes to maintain the index if the data is stationary. However, if the data is dynamic, the accuracy will deteriorate every time the database is skewed enough by new data. The only solution is to recreate the whole index.
The choice of index involves considering several factors, including the data type, dataset size, and the balance between search speed and accuracy.
The flat index provides a straightforward approach, with a focus on accuracy, making it suitable for smaller datasets or situations where we value precision over everything. On the other hand, the HNSW and IVFFlat indexes introduce approximations to improve search speed, offering solutions for larger datasets.
The choice of the index will largely depend on your specific use case and requirements. As with many aspects of software engineering, there is no one-size-fits-all solution. We should always consider our unique circumstances and requirements when choosing an index for our vector database.
The Postgres database utilizes three index types: Flat, HNSW, and IVFFlat.
The Flat index, although slow, offers the most accurate search — courtesy of the kNN search algorithm. However, as data volumes grow, search speed can fall off.
The HNSW (Hierarchical Navigable Small World) index, a highly accurate ANN (Approximate Nearest Neighbor) algorithm, offers a fast solution that works well with high-dimensional vectors but is memory-intensive. The benefit is that the index is data agnostic.
Lastly, the IVFFlat (Inverted File Flat) index, although slower and less accurate than HNSW, is more scalable due to its low memory usage and fast build time. It reduces the search scope through clustering, which makes it non-data-agnostic.
While learning how to build RAG applications is fun, when it comes to delivering products to production, we need to do more than build applications. That’s why, in our next blog post, we will talk about testing and evaluating RAG.