Similarity cache
Similarity-based caching in AutoRAG lets you serve responses from Cloudflare’s cache for queries that are similar enough to previous requests, not just exact matches. This speeds up response times and cuts costs by reusing answers for questions that are close in meaning.
Unlike basic caching, which only works for identical requests to compare prompts based on their content. When a request comes in:
- AutoRAG checks if a similar prompt (based on your chosen threshold) has been answered before.
- If a match is found, it returns the cached response instantly.
- If no match is found, it generates a new response and caches it.
To see if a response came from the cache, check the cf-aig-cache-status
header: HIT
for cached and MISS
for new.
- Volatile Cache: If two similar requests hit at the same time, the first might not cache in time for the second to use it, resulting in a
MISS
. - 30-Day Cache: Cached responses last 30 days, then expire automatically. No custom durations for now.
- Data Dependency: Cached responses are tied to specific document chunks. If those chunks change or get deleted, the cache clears to keep answers fresh.
Similarity caching in AutoRAG uses MinHash with Locality-Sensitive Hashing (LSH) to detect prompts that are lexically similar.
When a new prompt is received:
- The prompt is broken into overlapping token sequences (called shingles), typically 2–3 words each.
- These shingles are hashed into a compact fingerprint using the MinHash algorithm. Prompts with more overlapping shingles will have more similar fingerprints.
- Fingerprints are grouped into LSH buckets, which allow AutoRAG to quickly find past prompts that are likely to be similar without scanning every cached prompt.
- If a prompt in the same bucket meets the configured similarity threshold, its cached response is reused.
The similarity threshold decides how close two prompts need to be to reuse a cached response. Here’s what you can pick from:
Threshold | Description | Example Match |
---|---|---|
Exact | Near-identical matches only | "What’s the weather like today?" matches with "What is the weather like today?" |
Strong (default) | High semantic similarity | "What’s the weather like today?" matches with "How’s the weather today?" |
Broad | Moderate match, more hits | "What’s the weather like today?" matches with "Tell me today’s weather" |
Loose | Low similarity, max reuse | "What’s the weather like today?" matches with "Give me the forecast" |
Test these values to see which works best with your application.
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Products
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark