prefect_gcp.cloud_storage
Tasks for interacting with GCP Cloud Storage.
Functions
acloud_storage_create_bucket
bucket: Name of the bucket.gcp_credentials: Credentials to use for authentication with GCP.project: Name of the project to use; overrides the gcp_credentials project if provided.location: Location of the bucket.**create_kwargs: Additional keyword arguments to pass toclient.create_bucket.
- The bucket name.
cloud_storage_create_bucket
bucket: Name of the bucket.gcp_credentials: Credentials to use for authentication with GCP.project: Name of the project to use; overrides the gcp_credentials project if provided.location: Location of the bucket.**create_kwargs: Additional keyword arguments to pass toclient.create_bucket.
- The bucket name.
acloud_storage_download_blob_as_bytes
bucket: Name of the bucket.blob: Name of the Cloud Storage blob.gcp_credentials: Credentials to use for authentication with GCP.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**download_kwargs: Additional keyword arguments to pass toBlob.download_as_bytes.
- A bytes or string representation of the blob object.
cloud_storage_download_blob_as_bytes
bucket: Name of the bucket.blob: Name of the Cloud Storage blob.gcp_credentials: Credentials to use for authentication with GCP.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**download_kwargs: Additional keyword arguments to pass toBlob.download_as_bytes.
- A bytes or string representation of the blob object.
acloud_storage_download_blob_to_file
bucket: Name of the bucket.blob: Name of the Cloud Storage blob.path: Downloads the contents to the provided file path; if the path is a directory, automatically joins the blob name.gcp_credentials: Credentials to use for authentication with GCP.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_filename.
- The path to the blob object.
cloud_storage_download_blob_to_file
bucket: Name of the bucket.blob: Name of the Cloud Storage blob.path: Downloads the contents to the provided file path; if the path is a directory, automatically joins the blob name.gcp_credentials: Credentials to use for authentication with GCP.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_filename.
- The path to the blob object.
acloud_storage_upload_blob_from_string
data: String or bytes representation of data to upload.bucket: Name of the bucket.blob: Name of the Cloud Storage blob.gcp_credentials: Credentials to use for authentication with GCP.content_type: Type of content being uploaded.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_string.
- The blob name.
cloud_storage_upload_blob_from_string
data: String or bytes representation of data to upload.bucket: Name of the bucket.blob: Name of the Cloud Storage blob.gcp_credentials: Credentials to use for authentication with GCP.content_type: Type of content being uploaded.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_string.
- The blob name.
acloud_storage_upload_blob_from_file
file: Path to data or file like object to upload.bucket: Name of the bucket.blob: Name of the Cloud Storage blob.gcp_credentials: Credentials to use for authentication with GCP.content_type: Type of content being uploaded.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_fileorBlob.upload_from_filename.
- The blob name.
cloud_storage_upload_blob_from_file
file: Path to data or file like object to upload.bucket: Name of the bucket.blob: Name of the Cloud Storage blob.gcp_credentials: Credentials to use for authentication with GCP.content_type: Type of content being uploaded.chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.encryption_key: An encryption key.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_fileorBlob.upload_from_filename.
- The blob name.
cloud_storage_copy_blob
source_bucket: Source bucket name.dest_bucket: Destination bucket name.source_blob: Source blob name.gcp_credentials: Credentials to use for authentication with GCP.dest_blob: Destination blob name; if not provided, defaults to source_blob.timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).project: Name of the project to use; overrides the gcp_credentials project if provided.**copy_kwargs: Additional keyword arguments to pass toBucket.copy_blob.
- Destination blob name.
Classes
DataFrameSerializationFormat
An enumeration class to represent different file formats,
compression options for upload_from_dataframe
Attributes:
CSV: Representation for ‘csv’ file format with no compression and its related content type and suffix.CSV_GZIP: Representation for ‘csv’ file format with ‘gzip’ compression and its related content type and suffix.PARQUET: Representation for ‘parquet’ file format with no compression and its related content type and suffix.PARQUET_SNAPPY: Representation for ‘parquet’ file format with ‘snappy’ compression and its related content type and suffix.PARQUET_GZIP: Representation for ‘parquet’ file format with ‘gzip’ compression and its related content type and suffix.
compression
content_type
fix_extension_with
gcs_blob_path: The path to the GCS blob to be modified.
- The modified path to the GCS blob with the new extension.
format
suffix
GcsBucket
Block used to store data using GCP Cloud Storage Buckets.
Note! GcsBucket in prefect-gcp is a unique block, separate from GCS
in core Prefect. GcsBucket does not use gcsfs under the hood,
instead using the google-cloud-storage package, and offers more configuration
and functionality.
Attributes:
bucket: Name of the bucket.gcp_credentials: The credentials to authenticate with GCP.bucket_folder: A default path to a folder within the GCS bucket to use for reading and writing objects.
acreate_bucket
location: The location of the bucket.**create_kwargs: Additional keyword arguments to pass to thecreate_bucketmethod.
- The bucket object.
adownload_folder_to_path
from_folder: The path to the folder to download from; this gets prefixed with the bucket_folder.to_folder: The path to download the folder to. If not provided, will default to the current directory.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_filename.
- The absolute path that the folder was downloaded to.
adownload_object_to_file_object
from_path: The path to the blob to download from; this gets prefixed with the bucket_folder.to_file_object: The file-like object to download the blob to.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_file.
- The file-like object that the object was downloaded to.
adownload_object_to_path
from_path: The path to the blob to download; this gets prefixed with the bucket_folder.to_path: The path to download the blob to. If not provided, the blob’s name will be used.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_filename.
- The absolute path that the object was downloaded to.
aget_bucket
- The bucket object.
aget_directory
from_path: Path in GCS bucket to download from. Defaults to the block’s configured bucket_folder.local_path: Local path to download GCS bucket contents to. Defaults to the current working directory.
- A list of downloaded file paths.
alist_blobs
folder: The folder to list blobs from.
- A list of Blob objects.
alist_folders
folder: List all folders and subfolders inside given folder.
- A list of folders.
aput_directory
local_path: Path to local directory to upload from.to_path: Path in GCS bucket to upload to. Defaults to block’s configured bucket_folder.ignore_file: Path to file containing gitignore style expressions for filepaths to ignore.
- The number of files uploaded.
aread_path
path: Entire path to (and including) the key.
- A bytes or string representation of the blob object.
aupload_from_dataframe
df: The Pandas DataFrame to be uploaded.to_path: The destination path for the uploaded DataFrame.serialization_format: The format to serialize the DataFrame into. When passed as astr, the valid options are: ‘csv’, ‘csv_gzip’, ‘parquet’, ‘parquet_snappy’, ‘parquet_gzip’. Defaults toDataFrameSerializationFormat.CSV_GZIP.**upload_kwargs: Additional keyword arguments to pass to the underlyingupload_from_dataframemethod.
- The path that the object was uploaded to.
aupload_from_file_object
from_file_object: The file-like object to upload from.to_path: The path to upload the object to; this gets prefixed with the bucket_folder.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_file.
- The path that the object was uploaded to.
aupload_from_folder
from_folder: The path to the folder to upload from.to_folder: The path to upload the folder to. If not provided, will default to bucket_folder or the base directory of the bucket.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_filename.
- The path that the folder was uploaded to.
aupload_from_path
from_path: The path to the file to upload from.to_path: The path to upload the file to. If not provided, will use the file name of from_path; this gets prefixed with the bucket_folder.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_filename.
- The path that the object was uploaded to.
awrite_path
path: The key name. Each object in your bucket has a unique key (or key name).content: What you are uploading to GCS Bucket.
- The path that the contents were written to.
basepath
create_bucket
location: The location of the bucket.**create_kwargs: Additional keyword arguments to pass to thecreate_bucketmethod.
- The bucket object.
download_folder_to_path
from_folder: The path to the folder to download from; this gets prefixed with the bucket_folder.to_folder: The path to download the folder to. If not provided, will default to the current directory.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_filename.
- The absolute path that the folder was downloaded to.
download_object_to_file_object
from_path: The path to the blob to download from; this gets prefixed with the bucket_folder.to_file_object: The file-like object to download the blob to.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_file.
- The file-like object that the object was downloaded to.
download_object_to_path
from_path: The path to the blob to download; this gets prefixed with the bucket_folder.to_path: The path to download the blob to. If not provided, the blob’s name will be used.**download_kwargs: Additional keyword arguments to pass toBlob.download_to_filename.
- The absolute path that the object was downloaded to.
get_bucket
- The bucket object.
get_directory
from_path: Path in GCS bucket to download from. Defaults to the block’s configured bucket_folder.local_path: Local path to download GCS bucket contents to. Defaults to the current working directory.
- A list of downloaded file paths.
list_blobs
folder: The folder to list blobs from.
- A list of Blob objects.
list_folders
folder: List all folders and subfolders inside given folder.
- A list of folders.
put_directory
local_path: Path to local directory to upload from.to_path: Path in GCS bucket to upload to. Defaults to block’s configured bucket_folder.ignore_file: Path to file containing gitignore style expressions for filepaths to ignore.
- The number of files uploaded.
read_path
path: Entire path to (and including) the key.
- A bytes or string representation of the blob object.
upload_from_dataframe
df: The Pandas DataFrame to be uploaded.to_path: The destination path for the uploaded DataFrame.serialization_format: The format to serialize the DataFrame into. When passed as astr, the valid options are: ‘csv’, ‘csv_gzip’, ‘parquet’, ‘parquet_snappy’, ‘parquet_gzip’. Defaults toDataFrameSerializationFormat.CSV_GZIP.**upload_kwargs: Additional keyword arguments to pass to the underlyingupload_from_dataframemethod.
- The path that the object was uploaded to.
upload_from_file_object
from_file_object: The file-like object to upload from.to_path: The path to upload the object to; this gets prefixed with the bucket_folder.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_file.
- The path that the object was uploaded to.
upload_from_folder
from_folder: The path to the folder to upload from.to_folder: The path to upload the folder to. If not provided, will default to bucket_folder or the base directory of the bucket.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_filename.
- The path that the folder was uploaded to.
upload_from_path
from_path: The path to the file to upload from.to_path: The path to upload the file to. If not provided, will use the file name of from_path; this gets prefixed with the bucket_folder.**upload_kwargs: Additional keyword arguments to pass toBlob.upload_from_filename.
- The path that the object was uploaded to.
write_path
path: The key name. Each object in your bucket has a unique key (or key name).content: What you are uploading to GCS Bucket.
- The path that the contents were written to.