Skip to main content

prefect_gcp.cloud_storage

Tasks for interacting with GCP Cloud Storage.

Functions

acloud_storage_create_bucket

acloud_storage_create_bucket(bucket: str, gcp_credentials: GcpCredentials, project: Optional[str] = None, location: Optional[str] = None, **create_kwargs: Dict[str, Any]) -> str
Creates a bucket (async version). Args:
  • bucket: Name of the bucket.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • location: Location of the bucket.
  • **create_kwargs: Additional keyword arguments to pass to client.create_bucket.
Returns:
  • The bucket name.

cloud_storage_create_bucket

cloud_storage_create_bucket(bucket: str, gcp_credentials: GcpCredentials, project: Optional[str] = None, location: Optional[str] = None, **create_kwargs: Dict[str, Any]) -> str
Creates a bucket. Args:
  • bucket: Name of the bucket.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • location: Location of the bucket.
  • **create_kwargs: Additional keyword arguments to pass to client.create_bucket.
Returns:
  • The bucket name.

acloud_storage_download_blob_as_bytes

acloud_storage_download_blob_as_bytes(bucket: str, blob: str, gcp_credentials: GcpCredentials, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **download_kwargs: Dict[str, Any]) -> bytes
Downloads a blob as bytes (async version). Args:
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_as_bytes.
Returns:
  • A bytes or string representation of the blob object.

cloud_storage_download_blob_as_bytes

cloud_storage_download_blob_as_bytes(bucket: str, blob: str, gcp_credentials: GcpCredentials, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **download_kwargs: Dict[str, Any]) -> bytes
Downloads a blob as bytes. Args:
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_as_bytes.
Returns:
  • A bytes or string representation of the blob object.

acloud_storage_download_blob_to_file

acloud_storage_download_blob_to_file(bucket: str, blob: str, path: Union[str, Path], gcp_credentials: GcpCredentials, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **download_kwargs: Dict[str, Any]) -> Union[str, Path]
Downloads a blob to a file path (async version). Args:
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • path: Downloads the contents to the provided file path; if the path is a directory, automatically joins the blob name.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_filename.
Returns:
  • The path to the blob object.

cloud_storage_download_blob_to_file

cloud_storage_download_blob_to_file(bucket: str, blob: str, path: Union[str, Path], gcp_credentials: GcpCredentials, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **download_kwargs: Dict[str, Any]) -> Union[str, Path]
Downloads a blob to a file path. Args:
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • path: Downloads the contents to the provided file path; if the path is a directory, automatically joins the blob name.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_filename.
Returns:
  • The path to the blob object.

acloud_storage_upload_blob_from_string

acloud_storage_upload_blob_from_string(data: Union[str, bytes], bucket: str, blob: str, gcp_credentials: GcpCredentials, content_type: Optional[str] = None, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads a blob from a string or bytes representation of data (async version). Args:
  • data: String or bytes representation of data to upload.
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • content_type: Type of content being uploaded.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_string.
Returns:
  • The blob name.

cloud_storage_upload_blob_from_string

cloud_storage_upload_blob_from_string(data: Union[str, bytes], bucket: str, blob: str, gcp_credentials: GcpCredentials, content_type: Optional[str] = None, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads a blob from a string or bytes representation of data. Args:
  • data: String or bytes representation of data to upload.
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • content_type: Type of content being uploaded.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_string.
Returns:
  • The blob name.

acloud_storage_upload_blob_from_file

acloud_storage_upload_blob_from_file(file: Union[str, Path, BytesIO], bucket: str, blob: str, gcp_credentials: GcpCredentials, content_type: Optional[str] = None, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads a blob from file path or file-like object (async version). Usage for passing in file-like object is if the data was downloaded from the web; can bypass writing to disk and directly upload to Cloud Storage. Args:
  • file: Path to data or file like object to upload.
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • content_type: Type of content being uploaded.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_file or Blob.upload_from_filename.
Returns:
  • The blob name.

cloud_storage_upload_blob_from_file

cloud_storage_upload_blob_from_file(file: Union[str, Path, BytesIO], bucket: str, blob: str, gcp_credentials: GcpCredentials, content_type: Optional[str] = None, chunk_size: Optional[int] = None, encryption_key: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads a blob from file path or file-like object. Usage for passing in file-like object is if the data was downloaded from the web; can bypass writing to disk and directly upload to Cloud Storage. Args:
  • file: Path to data or file like object to upload.
  • bucket: Name of the bucket.
  • blob: Name of the Cloud Storage blob.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • content_type: Type of content being uploaded.
  • chunk_size: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key: An encryption key.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_file or Blob.upload_from_filename.
Returns:
  • The blob name.

cloud_storage_copy_blob

cloud_storage_copy_blob(source_bucket: str, dest_bucket: str, source_blob: str, gcp_credentials: GcpCredentials, dest_blob: Optional[str] = None, timeout: Union[float, Tuple[float, float]] = 60, project: Optional[str] = None, **copy_kwargs: Dict[str, Any]) -> str
Copies data from one Google Cloud Storage bucket to another, without downloading it locally. Args:
  • source_bucket: Source bucket name.
  • dest_bucket: Destination bucket name.
  • source_blob: Source blob name.
  • gcp_credentials: Credentials to use for authentication with GCP.
  • dest_blob: Destination blob name; if not provided, defaults to source_blob.
  • timeout: The number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • project: Name of the project to use; overrides the gcp_credentials project if provided.
  • **copy_kwargs: Additional keyword arguments to pass to Bucket.copy_blob.
Returns:
  • Destination blob name.

Classes

DataFrameSerializationFormat

An enumeration class to represent different file formats, compression options for upload_from_dataframe Attributes:
  • CSV: Representation for ‘csv’ file format with no compression and its related content type and suffix.
  • CSV_GZIP: Representation for ‘csv’ file format with ‘gzip’ compression and its related content type and suffix.
  • PARQUET: Representation for ‘parquet’ file format with no compression and its related content type and suffix.
  • PARQUET_SNAPPY: Representation for ‘parquet’ file format with ‘snappy’ compression and its related content type and suffix.
  • PARQUET_GZIP: Representation for ‘parquet’ file format with ‘gzip’ compression and its related content type and suffix.
Methods:

compression

compression(self) -> Union[str, None]
The compression type of the current instance.

content_type

content_type(self) -> str
The content type of the current instance.

fix_extension_with

fix_extension_with(self, gcs_blob_path: str) -> str
Fix the extension of a GCS blob. Args:
  • gcs_blob_path: The path to the GCS blob to be modified.
Returns:
  • The modified path to the GCS blob with the new extension.

format

format(self) -> str
The file format of the current instance.

suffix

suffix(self) -> str
The suffix of the file format of the current instance.

GcsBucket

Block used to store data using GCP Cloud Storage Buckets. Note! GcsBucket in prefect-gcp is a unique block, separate from GCS in core Prefect. GcsBucket does not use gcsfs under the hood, instead using the google-cloud-storage package, and offers more configuration and functionality. Attributes:
  • bucket: Name of the bucket.
  • gcp_credentials: The credentials to authenticate with GCP.
  • bucket_folder: A default path to a folder within the GCS bucket to use for reading and writing objects.
Methods:

acreate_bucket

acreate_bucket(self, location: Optional[str] = None, **create_kwargs) -> 'Bucket'
Creates a bucket (async version). Args:
  • location: The location of the bucket.
  • **create_kwargs: Additional keyword arguments to pass to the create_bucket method.
Returns:
  • The bucket object.
Examples: Create a bucket.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket(bucket="my-bucket")
await gcs_bucket.acreate_bucket()

adownload_folder_to_path

adownload_folder_to_path(self, from_folder: str, to_folder: Optional[Union[str, Path]] = None, **download_kwargs: Dict[str, Any]) -> Path
Downloads objects within a folder (excluding the folder itself) from the object storage service to a folder (async version). Args:
  • from_folder: The path to the folder to download from; this gets prefixed with the bucket_folder.
  • to_folder: The path to download the folder to. If not provided, will default to the current directory.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_filename.
Returns:
  • The absolute path that the folder was downloaded to.
Examples: Download my_folder to a local folder named my_folder.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.adownload_folder_to_path("my_folder", "my_folder")

adownload_object_to_file_object

adownload_object_to_file_object(self, from_path: str, to_file_object: BinaryIO, **download_kwargs: Dict[str, Any]) -> BinaryIO
Downloads an object from the object storage service to a file-like object (async version), which can be a BytesIO object or a BufferedWriter. Args:
  • from_path: The path to the blob to download from; this gets prefixed with the bucket_folder.
  • to_file_object: The file-like object to download the blob to.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_file.
Returns:
  • The file-like object that the object was downloaded to.
Examples: Download my_folder/notes.txt object to a BytesIO object.
from io import BytesIO
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
with BytesIO() as buf:
    await gcs_bucket.adownload_object_to_file_object("my_folder/notes.txt", buf)
Download my_folder/notes.txt object to a BufferedWriter.
    from prefect_gcp.cloud_storage import GcsBucket

    gcs_bucket = GcsBucket.load("my-bucket")
    with open("notes.txt", "wb") as f:
        await gcs_bucket.adownload_object_to_file_object("my_folder/notes.txt", f)

adownload_object_to_path

adownload_object_to_path(self, from_path: str, to_path: Optional[Union[str, Path]] = None, **download_kwargs: Dict[str, Any]) -> Path
Downloads an object from the object storage service to a path (async version). Args:
  • from_path: The path to the blob to download; this gets prefixed with the bucket_folder.
  • to_path: The path to download the blob to. If not provided, the blob’s name will be used.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_filename.
Returns:
  • The absolute path that the object was downloaded to.
Examples: Download my_folder/notes.txt object to notes.txt.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.adownload_object_to_path("my_folder/notes.txt", "notes.txt")

aget_bucket

aget_bucket(self) -> 'Bucket'
Returns the bucket object (async version). Returns:
  • The bucket object.
Examples: Get the bucket object.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.aget_bucket()

aget_directory

aget_directory(self, from_path: Optional[str] = None, local_path: Optional[str] = None) -> List[Union[str, Path]]
Copies a folder from the configured GCS bucket to a local directory (async version). Defaults to copying the entire contents of the block’s bucket_folder to the current working directory. Args:
  • from_path: Path in GCS bucket to download from. Defaults to the block’s configured bucket_folder.
  • local_path: Local path to download GCS bucket contents to. Defaults to the current working directory.
Returns:
  • A list of downloaded file paths.

alist_blobs

alist_blobs(self, folder: str = '') -> List['Blob']
Lists all blobs in the bucket that are in a folder (async version). Folders are not included in the output. Args:
  • folder: The folder to list blobs from.
Returns:
  • A list of Blob objects.
Examples: Get all blobs from a folder named “prefect”.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.alist_blobs("prefect")

alist_folders

alist_folders(self, folder: str = '') -> List[str]
Lists all folders and subfolders in the bucket (async version). Args:
  • folder: List all folders and subfolders inside given folder.
Returns:
  • A list of folders.
Examples: Get all folders from a bucket named “my-bucket”.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.alist_folders()
Get all folders from a folder called years
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.alist_folders("years")

aput_directory

aput_directory(self, local_path: Optional[str] = None, to_path: Optional[str] = None, ignore_file: Optional[str] = None) -> int
Uploads a directory from a given local path to the configured GCS bucket in a given folder (async version). Defaults to uploading the entire contents the current working directory to the block’s bucket_folder. Args:
  • local_path: Path to local directory to upload from.
  • to_path: Path in GCS bucket to upload to. Defaults to block’s configured bucket_folder.
  • ignore_file: Path to file containing gitignore style expressions for filepaths to ignore.
Returns:
  • The number of files uploaded.

aread_path

aread_path(self, path: str) -> bytes
Read specified path from GCS and return contents (async version). Provide the entire path to the key in GCS. Args:
  • path: Entire path to (and including) the key.
Returns:
  • A bytes or string representation of the blob object.

aupload_from_dataframe

aupload_from_dataframe(self, df: 'DataFrame', to_path: str, serialization_format: Union[str, DataFrameSerializationFormat] = DataFrameSerializationFormat.CSV_GZIP, **upload_kwargs: Dict[str, Any]) -> str
Upload a Pandas DataFrame to Google Cloud Storage in various formats (async version). This function uploads the data in a Pandas DataFrame to Google Cloud Storage in a specified format, such as .csv, .csv.gz, .parquet, .parquet.snappy, and .parquet.gz. Args:
  • df: The Pandas DataFrame to be uploaded.
  • to_path: The destination path for the uploaded DataFrame.
  • serialization_format: The format to serialize the DataFrame into. When passed as a str, the valid options are: ‘csv’, ‘csv_gzip’, ‘parquet’, ‘parquet_snappy’, ‘parquet_gzip’. Defaults to DataFrameSerializationFormat.CSV_GZIP.
  • **upload_kwargs: Additional keyword arguments to pass to the underlying upload_from_dataframe method.
Returns:
  • The path that the object was uploaded to.

aupload_from_file_object

aupload_from_file_object(self, from_file_object: BinaryIO, to_path: str, **upload_kwargs) -> str
Uploads an object to the object storage service from a file-like object (async version), which can be a BytesIO object or a BufferedReader. Args:
  • from_file_object: The file-like object to upload from.
  • to_path: The path to upload the object to; this gets prefixed with the bucket_folder.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_file.
Returns:
  • The path that the object was uploaded to.
Examples: Upload my_folder/notes.txt object to a BytesIO object.
from io import BytesIO
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
with open("notes.txt", "rb") as f:
    await gcs_bucket.aupload_from_file_object(f, "my_folder/notes.txt")
Upload BufferedReader object to my_folder/notes.txt.
from io import BufferedReader
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
with open("notes.txt", "rb") as f:
    await gcs_bucket.aupload_from_file_object(
        BufferedReader(f), "my_folder/notes.txt"
    )

aupload_from_folder

aupload_from_folder(self, from_folder: Union[str, Path], to_folder: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads files within a folder (excluding the folder itself) to the object storage service folder (async version). Args:
  • from_folder: The path to the folder to upload from.
  • to_folder: The path to upload the folder to. If not provided, will default to bucket_folder or the base directory of the bucket.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_filename.
Returns:
  • The path that the folder was uploaded to.
Examples: Upload local folder my_folder to the bucket’s folder my_folder.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.aupload_from_folder("my_folder")

aupload_from_path

aupload_from_path(self, from_path: Union[str, Path], to_path: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads an object from a path to the object storage service (async version). Args:
  • from_path: The path to the file to upload from.
  • to_path: The path to upload the file to. If not provided, will use the file name of from_path; this gets prefixed with the bucket_folder.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_filename.
Returns:
  • The path that the object was uploaded to.
Examples: Upload notes.txt to my_folder/notes.txt.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
await gcs_bucket.aupload_from_path("notes.txt", "my_folder/notes.txt")

awrite_path

awrite_path(self, path: str, content: bytes) -> str
Writes to an GCS bucket (async version). Args:
  • path: The key name. Each object in your bucket has a unique key (or key name).
  • content: What you are uploading to GCS Bucket.
Returns:
  • The path that the contents were written to.

basepath

basepath(self) -> str
Read-only property that mirrors the bucket folder. Used for deployment.

create_bucket

create_bucket(self, location: Optional[str] = None, **create_kwargs) -> 'Bucket'
Creates a bucket. Args:
  • location: The location of the bucket.
  • **create_kwargs: Additional keyword arguments to pass to the create_bucket method.
Returns:
  • The bucket object.
Examples: Create a bucket.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket(bucket="my-bucket")
gcs_bucket.create_bucket()

download_folder_to_path

download_folder_to_path(self, from_folder: str, to_folder: Optional[Union[str, Path]] = None, **download_kwargs: Dict[str, Any]) -> Path
Downloads objects within a folder (excluding the folder itself) from the object storage service to a folder. Args:
  • from_folder: The path to the folder to download from; this gets prefixed with the bucket_folder.
  • to_folder: The path to download the folder to. If not provided, will default to the current directory.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_filename.
Returns:
  • The absolute path that the folder was downloaded to.
Examples: Download my_folder to a local folder named my_folder.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.download_folder_to_path("my_folder", "my_folder")

download_object_to_file_object

download_object_to_file_object(self, from_path: str, to_file_object: BinaryIO, **download_kwargs: Dict[str, Any]) -> BinaryIO
Downloads an object from the object storage service to a file-like object, which can be a BytesIO object or a BufferedWriter. Args:
  • from_path: The path to the blob to download from; this gets prefixed with the bucket_folder.
  • to_file_object: The file-like object to download the blob to.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_file.
Returns:
  • The file-like object that the object was downloaded to.
Examples: Download my_folder/notes.txt object to a BytesIO object.
from io import BytesIO
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
with BytesIO() as buf:
    gcs_bucket.download_object_to_file_object("my_folder/notes.txt", buf)
Download my_folder/notes.txt object to a BufferedWriter.
    from prefect_gcp.cloud_storage import GcsBucket

    gcs_bucket = GcsBucket.load("my-bucket")
    with open("notes.txt", "wb") as f:
        gcs_bucket.download_object_to_file_object("my_folder/notes.txt", f)

download_object_to_path

download_object_to_path(self, from_path: str, to_path: Optional[Union[str, Path]] = None, **download_kwargs: Dict[str, Any]) -> Path
Downloads an object from the object storage service to a path. Args:
  • from_path: The path to the blob to download; this gets prefixed with the bucket_folder.
  • to_path: The path to download the blob to. If not provided, the blob’s name will be used.
  • **download_kwargs: Additional keyword arguments to pass to Blob.download_to_filename.
Returns:
  • The absolute path that the object was downloaded to.
Examples: Download my_folder/notes.txt object to notes.txt.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.download_object_to_path("my_folder/notes.txt", "notes.txt")

get_bucket

get_bucket(self) -> 'Bucket'
Returns the bucket object. Returns:
  • The bucket object.
Examples: Get the bucket object.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.get_bucket()

get_directory

get_directory(self, from_path: Optional[str] = None, local_path: Optional[str] = None) -> List[Union[str, Path]]
Copies a folder from the configured GCS bucket to a local directory. Defaults to copying the entire contents of the block’s bucket_folder to the current working directory. Args:
  • from_path: Path in GCS bucket to download from. Defaults to the block’s configured bucket_folder.
  • local_path: Local path to download GCS bucket contents to. Defaults to the current working directory.
Returns:
  • A list of downloaded file paths.

list_blobs

list_blobs(self, folder: str = '') -> List['Blob']
Lists all blobs in the bucket that are in a folder. Folders are not included in the output. Args:
  • folder: The folder to list blobs from.
Returns:
  • A list of Blob objects.
Examples: Get all blobs from a folder named “prefect”.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.list_blobs("prefect")

list_folders

list_folders(self, folder: str = '') -> List[str]
Lists all folders and subfolders in the bucket. Args:
  • folder: List all folders and subfolders inside given folder.
Returns:
  • A list of folders.
Examples: Get all folders from a bucket named “my-bucket”.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.list_folders()
Get all folders from a folder called years
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.list_folders("years")

put_directory

put_directory(self, local_path: Optional[str] = None, to_path: Optional[str] = None, ignore_file: Optional[str] = None) -> int
Uploads a directory from a given local path to the configured GCS bucket in a given folder. Defaults to uploading the entire contents the current working directory to the block’s bucket_folder. Args:
  • local_path: Path to local directory to upload from.
  • to_path: Path in GCS bucket to upload to. Defaults to block’s configured bucket_folder.
  • ignore_file: Path to file containing gitignore style expressions for filepaths to ignore.
Returns:
  • The number of files uploaded.

read_path

read_path(self, path: str) -> bytes
Read specified path from GCS and return contents. Provide the entire path to the key in GCS. Args:
  • path: Entire path to (and including) the key.
Returns:
  • A bytes or string representation of the blob object.

upload_from_dataframe

upload_from_dataframe(self, df: 'DataFrame', to_path: str, serialization_format: Union[str, DataFrameSerializationFormat] = DataFrameSerializationFormat.CSV_GZIP, **upload_kwargs: Dict[str, Any]) -> str
Upload a Pandas DataFrame to Google Cloud Storage in various formats. This function uploads the data in a Pandas DataFrame to Google Cloud Storage in a specified format, such as .csv, .csv.gz, .parquet, .parquet.snappy, and .parquet.gz. Args:
  • df: The Pandas DataFrame to be uploaded.
  • to_path: The destination path for the uploaded DataFrame.
  • serialization_format: The format to serialize the DataFrame into. When passed as a str, the valid options are: ‘csv’, ‘csv_gzip’, ‘parquet’, ‘parquet_snappy’, ‘parquet_gzip’. Defaults to DataFrameSerializationFormat.CSV_GZIP.
  • **upload_kwargs: Additional keyword arguments to pass to the underlying upload_from_dataframe method.
Returns:
  • The path that the object was uploaded to.

upload_from_file_object

upload_from_file_object(self, from_file_object: BinaryIO, to_path: str, **upload_kwargs) -> str
Uploads an object to the object storage service from a file-like object, which can be a BytesIO object or a BufferedReader. Args:
  • from_file_object: The file-like object to upload from.
  • to_path: The path to upload the object to; this gets prefixed with the bucket_folder.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_file.
Returns:
  • The path that the object was uploaded to.
Examples: Upload my_folder/notes.txt object to a BytesIO object.
from io import BytesIO
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
with open("notes.txt", "rb") as f:
    gcs_bucket.upload_from_file_object(f, "my_folder/notes.txt")
Upload BufferedReader object to my_folder/notes.txt.
from io import BufferedReader
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
with open("notes.txt", "rb") as f:
    gcs_bucket.upload_from_file_object(
        BufferedReader(f), "my_folder/notes.txt"
    )

upload_from_folder

upload_from_folder(self, from_folder: Union[str, Path], to_folder: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads files within a folder (excluding the folder itself) to the object storage service folder. Args:
  • from_folder: The path to the folder to upload from.
  • to_folder: The path to upload the folder to. If not provided, will default to bucket_folder or the base directory of the bucket.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_filename.
Returns:
  • The path that the folder was uploaded to.
Examples: Upload local folder my_folder to the bucket’s folder my_folder.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.upload_from_folder("my_folder")

upload_from_path

upload_from_path(self, from_path: Union[str, Path], to_path: Optional[str] = None, **upload_kwargs: Dict[str, Any]) -> str
Uploads an object from a path to the object storage service. Args:
  • from_path: The path to the file to upload from.
  • to_path: The path to upload the file to. If not provided, will use the file name of from_path; this gets prefixed with the bucket_folder.
  • **upload_kwargs: Additional keyword arguments to pass to Blob.upload_from_filename.
Returns:
  • The path that the object was uploaded to.
Examples: Upload notes.txt to my_folder/notes.txt.
from prefect_gcp.cloud_storage import GcsBucket

gcs_bucket = GcsBucket.load("my-bucket")
gcs_bucket.upload_from_path("notes.txt", "my_folder/notes.txt")

write_path

write_path(self, path: str, content: bytes) -> str
Writes to an GCS bucket. Args:
  • path: The key name. Each object in your bucket has a unique key (or key name).
  • content: What you are uploading to GCS Bucket.
Returns:
  • The path that the contents were written to.