All Fusion endpoints support advanced caching capabilities for enhanced performance. The specifics on what is cached are located at the end of their respective endpoint documentation articles. Here, we’ll cover general principles that apply to all endpoints.
Why Would I Want to Cache a Response? #
- Performance: cached response return in a fraction of the time it takes to regenerate a response (up to 20x faster in some cases)
- Consistency: cached responses always return the same response given the same inputs
How Do I Enable Caching? #
Configure the Agent Defaults #
You may select the default cache duration for completions on the Responses … Parameters … Cache Duration field.

You may select the default cache duration for semantic search on the API … Semantic Search … Search Cache Duration field.

Pass a Cache Duration with an API Request #
You may pass a cache_ttl
as a URL parameter on any endpoint, expressed as a number of seconds to cache:
curl \
--header 'x-api-key: apg_xxxxxxxxxxxxxxxxxxxxxxxxxxxx' \
--header 'Content-Type: text/plain' \
--data '{
"prompt": "How can a good God allow so much evil in the world?"
}' \
--url https://my.gospel.bot/api/v1/chat/completions?cache_ttl=300
You may also pass an x-cache-ttl
header with a request to any endpoint, expressed as a number of seconds to cache:
curl \
--header 'x-api-key: apg_xxxxxxxxxxxxxxxxxxxxxxxxxxxx' \
--header 'Content-Type: application/json' \
--header 'x-cache-ttl: 300' \
--data '{
"query": "Who is Jesus?"
}' \
--url https://my.gospel.bot/api/v1/search
Caching Behavior #
In traditional caching scenarios, one usually sets a cache expiry, and every subsequent matching call returns the cached results until the cache expires. The problem with typical caching in this manner is that there needs to be a way to invalidate (clear) the cache that’s separate from the call to cache the response.
We’ve built our cache to be a bit smarter. For any given request that includes a cache TTL parameter, we check against the age of any existing cached response. The cache is only used if the cached response is older than the supplied cache TTL. Hence, getting a fresh (uncached) result is as simple as making the request without a cache TTL parameter. No need to separately invalidate the cache.