# Google Cloud Tasks


Tasks that interface with various components of Google Cloud Platform.

Note that these tasks allow for a wide range of custom usage patterns, such as:

  • Initialize a task with all settings for one time use
  • Initialize a "template" task with default settings and override as needed
  • Create a custom Task that inherits from a Prefect Task and utilizes the Prefect boilerplate

All GCP related tasks can be authenticated using the GCP_CREDENTIALS Prefect Secret. See Third Party Authentication for more information.

# GCSDownload

class

prefect.tasks.gcp.storage.GCSDownload

(bucket, blob=None, project=None, encryption_key_secret=None, **kwargs)[source]

Task template for downloading data from Google Cloud Storage as a string.

Args:

  • bucket (str): default bucket name to download from
  • blob (str, optional): default blob name to download.
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when downloading the Blob
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor
Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:                                                                                                                                                       

prefect.tasks.gcp.storage.GCSDownload.run

(bucket=None, blob=None, project=None, credentials=None, encryption_key=None, encryption_key_secret=None)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • bucket (str, optional): the bucket name to upload to
  • blob (str, optional): blob name to download from
  • project (str, optional): Google Cloud project to work within. If not provided here or at initialization, will be inferred from your Google Cloud credentials
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • encryption_key (str, optional): an encryption key
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when uploading the Blob
Returns:
  • str: the data from the blob, as a string
Raises:
  • google.cloud.exception.NotFound: if create_bucket=False and the bucket name is not found
  • ValueError: if blob name hasn't been provided



# GCSUpload

class

prefect.tasks.gcp.storage.GCSUpload

(bucket, blob=None, project=None, create_bucket=False, encryption_key_secret=None, **kwargs)[source]

Task template for uploading data to Google Cloud Storage. Data can be a string or bytes.

Args:

  • bucket (str): default bucket name to upload to
  • blob (str, optional): default blob name to upload to; otherwise a random string beginning with prefect- and containing the Task Run ID will be used
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • create_bucket (bool, optional): boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults to False.
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when uploading the Blob
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor
Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:                                                                                                                                                       

prefect.tasks.gcp.storage.GCSUpload.run

(data, bucket=None, blob=None, project=None, credentials=None, encryption_key=None, create_bucket=False, encryption_key_secret=None, content_type=None, content_encoding=None)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • data (Union[str, bytes]): the data to upload; can be either string or bytes
  • bucket (str, optional): the bucket name to upload to
  • blob (str, optional): blob name to upload to a string beginning with prefect- and containing the Task Run ID will be used
  • project (str, optional): Google Cloud project to work within. Can be inferred from credentials if not provided.
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • encryption_key (str, optional): an encryption key
  • create_bucket (bool, optional): boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults to False.
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when uploading the Blob.
  • content_type (str, optional): HTTP ‘Content-Type’ header for this object.
  • content_encoding (str, optional): HTTP ‘Content-Encoding’ header for this object.
Raises:
  • google.cloud.exception.NotFound: if create_bucket=False and the bucket name is not found
Returns:
  • str: the blob name that now stores the provided data



# GCSCopy

class

prefect.tasks.gcp.storage.GCSCopy

(source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, **kwargs)[source]

Task template for copying data from one Google Cloud Storage bucket to another, without downloading it locally.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • source_bucket (str, optional): default source bucket name.
  • source_blob (str, optional): default source blob name.
  • dest_bucket (str, optional): default destination bucket name.
  • dest_blob (str, optional): default destination blob name.
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor
Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:                                                                                                                                                       

prefect.tasks.gcp.storage.GCSCopy.run

(source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, credentials=None)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • source_bucket (str, optional): default source bucket name.
  • source_blob (str, optional): default source blob name.
  • dest_bucket (str, optional): default destination bucket name.
  • dest_blob (str, optional): default destination blob name.
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
Returns:
  • str: the name of the destination blob
Raises:
  • ValueError: if source_bucket, source_blob, dest_bucket, or dest_blob are missing or point at the same object.



# BigQueryTask

class

prefect.tasks.gcp.bigquery.BigQueryTask

(query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None, **kwargs)[source]

Task for executing queries against a Google BigQuery table and (optionally) returning the results. Note that all initialization settings can be provided / overwritten at runtime.

Args:

  • query (str, optional): a string of the query to execute
  • query_params (list[tuple], optional): a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • dry_run_max_bytes (int, optional): if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising a ValueError if the maximum is exceeded
  • dataset_dest (str, optional): the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided, table_dest must also be provided
  • table_dest (str, optional): the optional name of a destination table to write the query results to, if you don't want them returned; if provided, dataset_dest must also be provided
  • to_dataframe (bool, optional): if provided, returns the results of the query as a pandas dataframe instead of a list of bigquery.table.Row objects. Defaults to False
  • job_config (dict, optional): an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected)
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryTask.run

(query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, credentials=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • query (str, optional): a string of the query to execute
  • query_params (list[tuple], optional): a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • dry_run_max_bytes (int, optional): if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising a ValueError if the maximum is exceeded
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • dataset_dest (str, optional): the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided, table_dest must also be provided
  • table_dest (str, optional): the optional name of a destination table to write the query results to, if you don't want them returned; if provided, dataset_dest must also be provided
  • to_dataframe (bool, optional): if provided, returns the results of the query as a pandas dataframe instead of a list of bigquery.table.Row objects. Defaults to False
  • job_config (dict, optional): an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected)
Raises:
  • ValueError: if the query is None
  • ValueError: if only one of dataset_dest / table_dest is provided
  • ValueError: if the query will execeed dry_run_max_bytes
Returns:
  • list: a fully populated list of Query results, with one item per row



# BigQueryStreamingInsert

class

prefect.tasks.gcp.bigquery.BigQueryStreamingInsert

(dataset_id=None, table=None, project=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via the streaming API. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

  • dataset_id (str, optional): the id of a destination dataset to write the records to
  • table (str, optional): the name of a destination table to write the records to
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryStreamingInsert.run

(records, dataset_id=None, table=None, project=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • records (list[dict]): the list of records to insert as rows into the BigQuery table; each item in the list should be a dictionary whose keys correspond to columns in the table
  • dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
  • table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • **kwargs (optional): additional kwargs to pass to the insert_rows_json method; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html
Raises:
  • ValueError: if all required arguments haven't been provided
  • ValueError: if any of the records result in errors
Returns:
  • the response from insert_rows_json



# CreateBigQueryTable

class

prefect.tasks.gcp.bigquery.CreateBigQueryTable

(project=None, dataset=None, table=None, schema=None, clustering_fields=None, time_partitioning=None, **kwargs)[source]

Ensures a BigQuery table exists; creates it otherwise. Note that most initialization keywords can optionally be provided at runtime.

Args:

  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • dataset (str, optional): the name of a dataset in that the table will be created
  • table (str, optional): the name of a table to create
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • clustering_fields (List[str], optional): a list of fields to cluster the table by
  • time_partitioning (bigquery.TimePartitioning, optional): a bigquery.TimePartitioning object specifying a partitioninig of the newly created table
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.CreateBigQueryTable.run

(project=None, credentials=None, dataset=None, table=None, schema=None)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • dataset (str, optional): the name of a dataset in that the table will be created
  • table (str, optional): the name of a table to create
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
Returns:
  • None
Raises:
  • SUCCESS: a SUCCESS signal if the table already exists



# BigQueryLoadGoogleCloudStorage

class

prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage

(uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

  • uri (str, optional): GCS path to load data from
  • dataset_id (str, optional): the id of a destination dataset to write the records to
  • table (str, optional): the name of a destination table to write the records to
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage.run

(uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • uri (str, optional): GCS path to load data from
  • dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
  • table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • **kwargs (optional): additional kwargs to pass to the bigquery.LoadJobConfig; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html
Raises:
  • ValueError: if all required arguments haven't been provided
  • ValueError: if the load job results in an error
Returns:
  • google.cloud.bigquery.job.LoadJob: the response from load_table_from_uri



# BigQueryLoadFile

class

prefect.tasks.gcp.bigquery.BigQueryLoadFile

(file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

  • file (Union[str, path-like object], optional): A string or path-like object of the file to be loaded
  • rewind (bool, optional): if True, seek to the beginning of the file handle before reading the file
  • size (int, optional): the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.
  • num_retries (int, optional): the number of max retries for loading the bigquery table from file. Defaults to 6
  • dataset_id (str, optional): the id of a destination dataset to write the records to
  • table (str, optional): the name of a destination table to write the records to
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryLoadFile.run

(file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • file (Union[str, path-liike object], optional): A string or path-like object of the file to be loaded
  • rewind (bool, optional): if True, seek to the beginning of the file handle before reading the file
  • size (int, optional): the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.
  • num_retries (int, optional): the number of max retries for loading the bigquery table from file. Defaults to 6
  • dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
  • table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task.
  • **kwargs (optional): additional kwargs to pass to the bigquery.LoadJobConfig; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html
Raises:
  • ValueError: if all required arguments haven't been provided or file does not exist
  • IOError: if file can't be opened and read
  • ValueError: if the load job results in an error
Returns:
  • google.cloud.bigquery.job.LoadJob: the response from load_table_from_file



This documentation was auto-generated from commit 4a4acb5
on October 23, 2020 at 16:22 UTC