How to cache workflow step outputs
The simplest way to cache the results of tasks within in a flow is to set persist_result=True
on a task definition.
This will implicitly use the DEFAULT
cache policy, which is a composite cache policy defined as:
This means subsequent calls of a task with identical inputs from within the same parent run will return cached results without executing the body of the function.
The TASK_SOURCE
component of the DEFAULT
cache policy helps avoid naming collisions between similar tasks that should not share a cache.
Cache based on inputs
To cache the result of a task based only on task inputs, set cache_policy=INPUTS
in the task decorator:
The above task will sleep the first time it is called with x=1
, but will not sleep for any subsequent calls with the same input.
Prefect ships with several cache policies that can be used to customize caching behavior.
Cache based on a subset of inputs
To cache based on a subset of inputs, you can subtract kwargs from the INPUTS
cache policy.
Cache with an expiration
To cache with an expiration, set the cache_expiration
parameter on the task decorator.
Ignore the cache
To ignore the cache regardless of the cache policy, set the refresh_cache
parameter on the task decorator.
To refresh the cache for all tasks, use the PREFECT_TASKS_REFRESH_CACHE
setting.
Setting PREFECT_TASKS_REFRESH_CACHE=true
changes the default behavior of all tasks to refresh.
If you have tasks that should not refresh when this setting is enabled, set refresh_cache
to False
. These tasks will never write to the cache. If a cache key exists it will be read, not updated.
If a cache key does not exist yet, these tasks can still write to the cache.
Cache on multiple criteria
Cache policies can be combined using the +
operator.
The above task will use a cached result as long as the same inputs and task source are used.
Cache in a distributed environment
By default Prefect stores results locally in ~/.prefect/storage/
. To share the cache across tasks running on different machines, provide a storage block to the result_storage
parameter on the task decorator.
Here’s an example with of a task that uses an S3 bucket to store cache records:
When using a storage block from a Prefect integration package, the package the storage block is imported from must be installed in all environments where the task will run.
For example, the prefect_aws
package must be installed to use the S3Bucket
storage block.
Further reading
For more advanced caching examples, see the advanced caching how-to guide.
For more information on Prefect’s caching system, see the caching concepts page.