Exploring Hypothetical Questions and HyDE strategies, designed to boost the accuracy of LLM-based applications by refining the match between user queries and data. It examines the generation of hypothetical questions and answers emphasizing the contrast in their approach.
You may have experimented with the RAG around your LLM-based application and were left somewhat disappointed with the accuracy. While setting up a basic RAG is not hard, the result sometimes tends to be unsatisfactory. To show its full power we need to optimize the RAG pipeline. One way to do it is by introducing hierarchies as previously discussed in hierarchical index retrieval. It might even be that you already have hierarchies, and you need even more accuracy.
Another way to further improve accuracy is to utilize Hypothetical Questions and HyDE strategies. Why do we cover both strategies in a single article? Well, that's because Hypothetical Questions and HyDE are two strategies with similar ideas but on different sides of a coin. Their core concept lies in increasing the similarity between chunks inside a database and the user queries, just how they achieve it is different.
How do we increase the similarity between chunks and the user query? We either make user queries semantically closer to chunks or make chunks semantically closer to user queries. For example, if we’re building an LLM-based customer service chatbot, most of the user queries will be questions. So, if we’re adapting chunks, we should make them more question-like to increase their similarity.
The Hypothetical Questions strategy utilizes the LLM to generate a question (or more if you wish) for each chunk. Throughout the embedding process, these generated questions are transformed into vectors. This implies that during retrieval of relevant chunks, we will test the relevance of the user’s question against the hypothetical question embeddings. So far, we’ve got the relevant hypothetical questions, but we need the chunks from our document that contain knowledge about the domain. When the user poses a question, we’re not going to return additional questions, that would be hilarious.
Obviously, we need to return the chunks of the original document, but how do we get these chunks from the hypothetical questions? Since each question is created for a specific chunk, there is a one-to-one mapping (or many-to-one if multiple questions are generated for the same chunk). Of course, the LLM might have generated the same hypothetical question for two chunks with slightly different content, but we won’t bother with this case. A solution lies in a little bit of tweaking of the ingestion process, and here comes our ally – metadata. Before the hypothetical question entries are stored in the vector database, we’ll store the identifier of the belonging chunk in the metadata. Now, we can find the chunks that are needed to build a context.
To clear up any confusion, let’s take a closer look at the ingestion pipeline on the following schema:
Image 1 - Hypothetical questions schema
We omitted the part with chunk preprocessing, creating a context from the retrieval result, and sending it to the LLM to reduce complexity. However, we are still taking these steps, consistent with all previous index retrieval strategies.
The first thing that likely catches your eye is a long arrow leading from the chunk directly to the database, bypassing the embedding model. This approach introduces something new that wasn’t showcased in previous strategies. Since the Hypothetical Questions strategy doesn’t require the vector representation of the chunks, we’ll eliminate the overhead of the embedding process with pleasure. However, don’t get confused, we still need the embedding for the hypothetical questions.
Another point worth noting is that we don’t need to save the content of hypothetical questions, it's optional. Wait a second, we’ve created a bunch of questions, and now we’re throwing them away? The questions were transformed into vectors, which actually provide value for this strategy since the similarity search is performed on them. However, the decision to store vectors or not depends on your use case. For instance, saving hypothetical questions may be useful for subsequent testing purposes, which includes adjusting the prompt that is provided to the LLM and so on. Storing additional data is worthwhile if it proves to be beneficial later on, especially considering the low cost of storage.
In HyDE, the approach essentially involves doing the opposite. Instead of converting the chunks to match the user question, the user question is converted to match the chunks. It's done by utilizing LLM to generate hypothetical answers given the user question. Therefore, when retrieving relevant chunks, we’re testing the relevance of the generated answers to chunk vectors, which should be more similar in semantics than the user’s starting question. Let’s take a detailed look at the following schema:
Image 2 - HyDE schema
Firstly, we notice a simplified database design. Compared to the Hypothetical Questions, we’re left with a single collection. Another aspect is the switch in the complexity between the ingestion and retrieval pipelines. The latter one is more complex and significantly impacts the application's responsiveness. A factor adding to this complexity is the embedding model used in both pipelines.
However, this issue is minor since the embedding models are relatively fast. It’s the additional LLM call that contributes even more to the complexity and consequently leads to worse response times. The keyword is additional. Remember, once the retrieval result is obtained, it’s added to the prompt and sent to the LLM to get a final answer. In essence, the HyDE strategy involves making two calls to the LLM for a single question.
The architecture seems fine, yet you may have noticed something weird, and that is the nature of the hypothetical answer. In the Hypothetical Questions strategy, it makes sense to write some possible questions about snippets of text, it’s an ordinary task like writing the questions for an exam. On the other hand, the concept of the hypothetical answer is not trivial to grasp in terms of RAG. With that being said, let's take some more time to explore this.
Knowing the hypothetical answer is generated directly from the user’s question, you might come up with the following question doubting the purpose of the HyDE strategy: how does the LLM know the answer to the question if we didn’t provide any context to it? The truth is, it doesn't! The purpose of the hypothetical answer is to capture the relevance patterns, not to provide a correct response. These answers are often incorrect. However, we are not interested in the sole content of the hypothetical answer and its truthfulness, we are interested in its semantics, i.e. similarity with the chunks of the original document that are possibly carrying the real answer.
We have a chunk from the US government website:
“The 46th and current president of the United States is Joseph R. Biden, Jr. He was sworn into office on January 20, 2021.”
The user’s question is:
“Who is the president of the USA?”
Let’s pretend that the LLM we’re using is trained 10 years ago from now, back in 2014. When prompted to make a hypothetical answer to the user’s question, it answers with:
“Barack Obama is the current President of the United States. He is in his second term, having been re-elected in 2012.”
As you can see, the underlined part of the hypothetical answer is similar to the chunk which contains the relevant information, and this is exactly what we’re trying to achieve. Besides the fact that the information is outdated, it's the context surrounding it that increases the relevance score of the chunk containing that information during a similarity search.
Although the underlying concepts of these strategies seem similar in terms of whether the chunks are converted to match the user question or vice versa, their differences are substantial as evident from the previous schemas. We cannot determine which strategy is superior based solely on a direct comparison of their architectures. Indeed, we've considered factors such as responsiveness, but our focus now is on overall accuracy. Therefore, our conclusion hinges on testing. However, we’ll save this topic for a future blog post.
The exploration of Hypothetical Questions and HyDE (Hypothetical Document Embeddings) strategies presents a nuanced view of enhancing LLM-based applications. Both strategies aim to bridge the semantic gap between user queries and database chunks, albeit through different mechanisms. The Hypothetical Questions strategy focuses on generating questions to match the chunks, while HyDE does the reverse by generating answers to the user’s questions which align with chunk semantics. The choice between these approaches depends on specific application requirements, with considerations extending to system complexity, responsiveness, and the need for additional LLM interactions.
Hypothetical Questions and HyDE are innovative strategies aimed at improving the accuracy of LLM-based applications by ensuring a closer semantic match between user queries and database chunks. Hypothetical Questions involve generating questions from chunks, and HyDE focuses on adapting user queries to match chunk semantics. Despite their similarities in objectives, they significantly differ in implementation and impact on system complexity and responsiveness. HyDE is characterized by the simpler ingestion pipeline and complex retrieval pipeline. The effectiveness of each strategy is subject to practical testing.