# PIN-7: Storage and Execution
Date: April 20, 2019
Authors: Josh Meek & Chris White
Our current environment setup contains multiple classes, some of which can be deployed, others which can't. Moreover, the environment classes tightly couple storage of a flow with execution platform details. This leads to an unsatisfactory interface, and a spreading out of documentation. Additionally, it makes it hard to delineate which environments are appropriate for which use cases and also requires a large number of highly specific classes.
The core of the proposal is to create two separate interfaces: one called
Storage for storing flows, and one called
Environment for encoding execution environment logic and setup. Let's review each of these individually.
The proposed Prefect Storage interface encapsulates logic for storing, serializing and even running Flows. Each storage unit will be able to store multiple flows (possibly with the constraint of name uniqueness within a given unit), and will expose the following methods and attributes:
- a name attribute
flowsattribute that is a dictionary of flow name -> location
add_flow(flow: Flow) -> strmethod for adding flows to Storage, and which will return the location of the given flow in the Storage unit
__contains__(self, obj) -> boolspecial method for determining whether the Storage contains a given Flow
- one of
get_env_runner(flow_location: str)for retrieving a way of interfacing with either
FlowRunnerfor the flow;
get_env_runneris intended for situations where flow execution can only be interacted with via environment variables
build() -> Storagemethod for "building" the storage
serialize() -> dictmethod for serializing the relevant information about this Storage for later re-use.
# Execution Environment
The proposed Execution Environment interface encapsulates logic for setting up and executing a flow run on various infrastructure. The interface for an
Environment is simple:
setup(storage: Storage) -> Nonemethod for setting up any required infrastructure for the execution environment (e.g., spinning up resources)
execute(storage: Storage, flow_location: str) -> Nonemethod that serves as an entrypoint for individual Flow execution
serialize() -> dictmethod for serializing the relevant information about this environment
An environment can require certain types of Storage, but should be able to do all its work via interacting with the Storage interface above. Note that all of environment, storage and flow_location can be serialized within a given Flow.
All environments will need a total rewrite, new storage classes will need to be introduced, agent interfaces will need to be updated and PIN-3 will be partially voided. Additionally, the Flow serialization schema will need to be updated to include storage, environment and