Similarity cache

Similarity-based caching in AutoRAG lets you serve responses from Cloudflare’s cache for queries that are similar enough to previous requests, not just exact matches. This speeds up response times and cuts costs by reusing answers for questions that are close in meaning.

How It Works

Unlike basic caching, which only works for identical requests to compare prompts based on their content. When a request comes in:

AutoRAG checks if a similar prompt (based on your chosen threshold) has been answered before.
If a match is found, it returns the cached response instantly.
If no match is found, it generates a new response and caches it.

To see if a response came from the cache, check the cf-aig-cache-status header: HIT for cached and MISS for new.

Cache behavior

Volatile Cache: If two similar requests hit at the same time, the first might not cache in time for the second to use it, resulting in a MISS.
30-Day Cache: Cached responses last 30 days, then expire automatically. No custom durations for now.
Data Dependency: Cached responses are tied to specific document chunks. If those chunks change or get deleted, the cache clears to keep answers fresh.

How Similarity Matching Works

Similarity caching in AutoRAG uses MinHash with Locality-Sensitive Hashing (LSH) to detect prompts that are lexically similar.

When a new prompt is received:

The prompt is broken into overlapping token sequences (called shingles), typically 2–3 words each.
These shingles are hashed into a compact fingerprint using the MinHash algorithm. Prompts with more overlapping shingles will have more similar fingerprints.
Fingerprints are grouped into LSH buckets, which allow AutoRAG to quickly find past prompts that are likely to be similar without scanning every cached prompt.
If a prompt in the same bucket meets the configured similarity threshold, its cached response is reused.

Choosing a Threshold

The similarity threshold decides how close two prompts need to be to reuse a cached response. Here’s what you can pick from:

Threshold	Description	Example Match
Exact	Near-identical matches only	"What’s the weather like today?" matches with "What is the weather like today?"
Strong (default)	High semantic similarity	"What’s the weather like today?" matches with "How’s the weather today?"
Broad	Moderate match, more hits	"What’s the weather like today?" matches with "Tell me today’s weather"
Loose	Low similarity, max reuse	"What’s the weather like today?" matches with "Give me the forecast"

Test these values to see which works best with your application.

Was this helpful?

Community
X
Discord
YouTube
GitHub