RAG Strategies - Context Enrichment

Continuing on the RAG strategies series, the article explores how to use the chunk size to improve the retrieval process performance. It presents two popular strategies: Sentence Window Retrieval and Auto-Merging Retrieval.

Context enrichment - cover
Development7 min read
Franjo portrait
FranjoMindekFranjo Mindek2024-03-27
Retrieval Augmented GenerationAI Blog SeriesChunkingRetrieval StrategyVector Search
Retrieval Augmented Generation
AI Blog Series
Retrieval Strategy
Vector Search

The problem of choosing the right chunk size

The initial phase of implementing Retrieval Augmented Generation (RAG) typically involves basic index retrieval, a straightforward approach that can provide us with a starting point. If you’re unfamiliar with the concept we recommend you read our previous article. However, it should be acknowledged that this method does not guarantee an accurate response. As accuracy is one of the most important metrics for LLM-based applications, starting from this article we will explore various ways to enhance the performance of RAG.

One of the core concepts of RAG is the use of chunks. The size of these chunks influences the different performance metrics of RAG. That is why it's important to consider which ones fit the best for the specific use case.

For example, smaller chunks by virtue of their lower word count mention fewer topics and concepts. This is beneficial because the more discrete the chunks, the fewer false positives we will have.

As for bigger chunks, they offer a more comprehensive context, providing much-needed background information for the topic in question. On the other hand, they could mention a concept without holding any relevant data for it, which brings us to false positives.

This is where context enrichment comes into play. Instead of worrying about optimizing the chunk size for our specific use case, we can combine the benefits of both bigger and smaller chunks in a single solution.

Context enrichment

Context enrichment is a retrieval strategy based on the idea of working with smaller units of data that are then enriched with additional context.

By breaking down the document into smaller chunks, we reduce the surrounding noise, making the retrieval process more precise and accurate. Once we are confident that we are targeting the correct parts of the documents, we can introduce additional context to these chunks for better clarity.

The actual implementation of this strategy can vary. Two popular methods are sentence window retrieval and auto-merging retrieval.

Sentence window retrieval

The name of this retrieval strategy comes from the concept of sliding windows often found in software engineering. It refers to the technique of reducing the scope of information to its minimum requirements, changing its size and location only when necessary.

Sentence window retrieval is very literal in its implementation of the “smaller units to bigger units” concept. The idea is straightforward: start the retrieval by searching the index of smaller chunks (as small as one sentence). Once you find the relevant chunks, retrieve not only them but also some of their neighboring chunks. These smaller chunks are then concatenated into a single larger chunk.

Sentence Window Retrieval - retrieval Image 1 - Sentence window retrieval during the retrieval process. In our index, the solid colored blocks represent the chunks that were retrieved as the most similar to the query. The striped chunks represent those chunks’ neighbors that were additionally retrieved to enrich the context of the original chunks.

The reason why the strategy is named sentence window retrieval is that the units of size for our chunks are sentences, and sentences are easy to work with.

Most chunking strategies have defined overlaps. An overlap means that the beginning of the next chunk contains part of the previous chunk. This overlap reduces the likelihood of awkward chunking borders, allowing for continuity and context between the chunks. The downside is that we must find the overlaps and remove them before concatenating. But when working on sentence level without overlaps, concatenation becomes trivial.

Despite its simplicity, there are some complications to consider when implementing sentence window retrieval. Ensure that the window does not exceed its boundaries (the first chunk has no predecessor, and the last one has no successor), and also be careful that you concatenate chunks in the correct order.

Sentence Window Retrieval - indexing Image 2 - Sentence window retrieval during the indexing process. The document is chunked in a linear fashion so that each chunk can have metadata with a sequential index

The functionality of retrieving neighboring chunks is made possible with metadata. You only need to track the document from which the chunk originates and the index of the chunk within that document. Having this information makes it easy to retrieve a certain number of chunks preceding and succeeding our target chunk.

Auto-merging retrieval

Auto-merging retrieval follows the same principles but executes them differently. The core ideas for auto-merging retrieval are parent-child relationships and on-demand enrichment. We can think of sentence window retrieval as the proactive sibling of auto-merging retrieval.

The concept of parent-child relationships is self-explanatory. When we process a document, we initially break it down into large segments, referred to as parent chunks. We then proceed to further divide these parent chunks into smaller segments, known as child chunks. Child chunks are linked to their respective parent chunks. This pattern can be repeated across multiple levels, though, with each successive level, the likelihood of merging during retrieval decreases. That's why having more than one or two parent-child relationships is usually excessive.

As for the on-demand aspect of the on-demand enrichment, differences with sentence window retrieval start after finding relevant chunks. Instead of expanding the chunk window, we try to identify the hotspots, which can be thought of as the localized areas from which we retrieved the chunks. The logic is that if many chunks originate from a nearby area, we should return the entire area. If we refer to the larger areas of the document as parent chunks and retrieved chunks as child chunks. The theory is that if we retrieve many chunks from the same parent, we should swap the retrieved child chunks for the parent chunk.

Auto-Merging Retrieval - retrieval Image 3 - Auto-merging retrieval during the retrieval process. The blue borders represent the child chunks sharing the same parent chunk. If we retrieve enough child chunks they are merged into a parent chunk instead.

The difficult part is deciding on the parameters of the strategy. How large are the parent chunks compared to the child chunks? How many child chunks are needed to retrieve the parent chunk? Is it a fixed value or a function of some sort (by taking distance into account)?

To achieve the parent-child relationship, we will as always rely on metadata. During the chunking process, in our child chunks, we will need to keep the index of the parent chunk for the specific document we are chunking. That's all the metadata we need to allow for the auto-merging process.

The problem of storing parent chunks is more open-ended. We can either store them somewhere outside of the chunk index or try to recreate them from chunks via the metadata information. The biggest problem in the latter solution is dealing with overlaps, especially if we use advanced chunking strategies.

Auto-Merging Retrieval - indexing Image 4 - Auto-merging retrieval during the indexing process. The indexing process happens in batches of child chunks that share the same parent, highlighted by a striped blue background. We do this to keep track of the “parentIndex“ inside the metadata.

Problem of relevance

A potential issue that can arise with this approach involves determining the relevance metrics of the retrieved chunks. Let's imagine that through the retrieval process, we retrieved a merged parent chunk. While the retrieval process of the child chunks is known, the parent chunk is a result of a merging process, not a direct product of similarity search. This raises the question: how can we measure the relevance of chunks that were created from external processes?

For that, we will need a testing framework external to our vector database. Though, we will leave that to a future article.

Impact on performance

As the above-mentioned strategies are not that complex, the impact on performance isn't too noticeable - most of it comes from having one additional call to the database.

Sentence window retrieval is an eager strategy that will always execute. That means that for each retrieved chunk, we will have to retrieve additional neighboring chunks. This will undoubtedly require additional database calls that will increase the time required for the user to get an answer.

As for the auto-merging retrieval, since it is an on-demand strategy, the number of database calls is lower than in the sentence window retrieval. Depending on the solution for parent chunk storage, we will need additional storage or extra logic to deal with the merging process. Anyhow, the impact of logic on the performance is negligible compared to the database calls.


Context enrichment plays a role in enhancing the performance of RAG. By carefully choosing the appropriate chunk size and implementing strategies like sentence window retrieval and auto-merging retrieval, we can improve the accuracy and precision of our retrieval process.

The choice between different strategies should be guided by the nature of your data and the specific requirements of your use case. If your retrievals tend to be clustered, auto-merging retrieval will excel. Likewise whenever you need more precision with chunks sentence window retrieval can be introduced.

Regardless of the chosen strategy, the key to successful context enrichment lies in the effective use of metadata to track and manage chunks.


Context enrichment is a retrieval strategy in RAG (Retrieval Augmented Generation) that improves accuracy by working with smaller chunks of data that are enriched with additional context. It relies on the concept that chunk size impacts the RAG performance. Small chunks reduce false positives but may lack relevant data, whereas large chunks provide comprehensive context but can include unnecessary information.

Context enrichment combines the benefits of both. Two main strategies are sentence window retrieval, which starts with small chunks and expands them during retrieval, and auto-merging retrieval, which organizes chunks into parent-child relationships and adds context as needed. Both strategies use metadata to manage chunks.

In our upcoming article, we will talk more about the document scalability problem in RAG, showcasing a strategy that tries to solve the problem by introducing hierarchy.

Get blog post updates
Sign up to our newsletter and never miss an update about relevant topics in the industry.