File systems¶
A file system block is an object which allows you to read and write data from paths. Prefect provides multiple built-in file system types that cover a wide range of use cases.
Additional file system types are available in Prefect Collections.
Local file system¶
The LocalFileSystem
block enables interaction with the files in your current development environment.
LocalFileSystem
properties include:
Property | Description |
---|---|
basepath | String path to the location of files on the local filesystem. Access to files outside of the base path will not be allowed. |
from prefect.filesystems import LocalFileSystem
fs = LocalFileSystem(basepath="/foo/bar")
Limited access to local file system
Be aware that LocalFileSystem
access is limited to the exact path provided. This file system may not be ideal for some use cases. The execution environment for your workflows may not have the same file system as the enviornment you are writing and deploying your code on.
Use of this file system can limit the availability of results after a flow run has completed or prevent the code for a flow from being retrieved successfully at the start of a run.
Remote file system¶
The RemoteFileSystem
block enables interaction with arbitrary remote file systems. Under the hood, RemoteFileSystem
uses fsspec
and supports any file system that fsspec
supports.
RemoteFileSystem
properties include:
Property | Description |
---|---|
basepath | String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed. |
settings | Dictionary containing extra parameters required to access the remote file system. |
The file system is specified using a protocol:
s3://my-bucket/my-folder/
will use S3gcs://my-bucket/my-folder/
will use GCSaz://my-bucket/my-folder/
will use Azure
For example, to use it with Amazon S3:
from prefect.filesystems import RemoteFileSystem
block = RemoteFileSystem(basepath="s3://my-bucket/folder/")
block.save("dev")
You may need to install additional libraries to use some remote storage types.
RemoteFileSystem examples¶
How can we use RemoteFileSystem
to store our flow code?
The following is a use case where we use MinIO as a storage backend:
from prefect.filesystems import RemoteFileSystem
minio_block = RemoteFileSystem(
basepath="s3://my-bucket",
settings={
"key": "MINIO_ROOT_USER",
"secret": "MINIO_ROOT_PASSWORD",
"client_kwargs": {"endpoint_url": "http://localhost:9000"},
},
)
minio_block.save("minio")
S3¶
The S3
file system block enables interaction with Amazon S3. Under the hood, S3
uses s3fs
.
S3
properties include:
Property | Description |
---|---|
basepath | String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed. |
aws_access_key_id | AWS Access Key ID |
aws_secret_access_key | AWS Secret Access Key |
To create a block:
from prefect.filesystems import S3
block = S3(basepath="my-bucket/folder/")
block.save("dev")
To use it in a deployment:
prefect deployment build path/to/flow.py:flow_name --name deployment_name --tag dev -sb s3/dev
You need to install s3fs
to use it.
GCS¶
The GCS
file system block enables interaction with Google Cloud Storage. Under the hood, GCS
uses gcsfs
.
GCS
properties include:
Property | Description |
---|---|
basepath | String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed. |
service_account_info | The contents of a service account keyfile as a JSON string. |
project | The project the GCS bucket resides in. If not provided, the project will be inferred from the credentials or environment. |
To create a block:
from prefect.filesystems import GCS
block = GCS(basepath="my-bucket/folder/")
block.save("dev")
To use it in a deployment:
prefect deployment build path/to/flow.py:flow_name --name deployment_name --tag dev -sb gcs/dev
You need to install gcsfs
to use it.
Azure¶
The Azure
file system block enables interaction with Azure Datalake and Azure Blob Storage. Under the hood, Azure
uses adlfs
.
Azure
properties include:
Property | Description |
---|---|
basepath | String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed. |
azure_storage_connection_string | Azure storage connection string. |
azure_storage_account_name | Azure storage account name. |
azure_storage_account_key | Azure storage account key. |
To create a block:
from prefect.filesystems import Azure
block = Azure(basepath="my-bucket/folder/")
block.save("dev")
To use it in a deployment:
prefect deployment build path/to/flow.py:flow_name --name deployment_name --tag dev -sb az/dev
You need to install adlfs
to use it.
Handling credentials for cloud object storage services¶
If you leverage S3
, GCS
, or Azure
storage blocks, and you don't explicitly configure credentials on the respective storage block, those credentials will be inferred from the environment. Make sure to set those either explicitly on the block or as environment variables, configuration files, or IAM roles within both the build and runtime environment for your deployments.
Filesystem-specific package dependencies in Docker images¶
The core package and Prefect base images don't include filesystem-specific package dependencies such as s3fs
, gcsfs
or adlfs
. To solve that problem in dockerized deployments, you can leverage the EXTRA_PIP_PACKAGES
environment variable. Those dependencies will be installed at runtime within your Docker container or Kubernetes Job before the flow starts running.
Here is an example showing how you can specify that in your deployment YAML manifest:
infrastructure:
type: docker-container
env:
EXTRA_PIP_PACKAGES: s3fs # could be gcsfs, adlfs, etc.
Saving and loading file systems¶
Configuration for a file system can be saved to the Prefect API. For example:
fs = RemoteFileSystem(basepath="s3://my-bucket/folder/")
fs.write_path("foo", b"hello")
fs.save("dev-s3")
This file system can be retrieved for later use with load
.
fs = RemoteFileSystem.load("dev-s3")
fs.read_path("foo") # b'hello'
Readable and writable file systems¶
Prefect provides two abstract file system types, ReadableFileSystem
and WriteableFileSystem
.
- All readable file systems must implement
read_path
, which takes a file path to read content from and returns bytes. - All writeable file systems must implement
write_path
which takes a file path and content and writes the content to the file as bytes.
A file system may implement both of these types.