prefect_gcp.bigquery
Tasks for interacting with GCP BigQuery
Functions
abigquery_query
query: String of the query to execute.gcp_credentials: Credentials to use for authentication with GCP.query_params: List of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted.dry_run_max_bytes: If provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising aValueErrorif the maximum is exceeded.dataset: Name of a destination dataset to write the query results to, if you don’t want them returned; if provided,tablemust also be provided.table: Name of a destination table to write the query results to, if you don’t want them returned; if provided,datasetmust also be provided.to_dataframe: If provided, returns the results of the query as a pandas dataframe instead of a list ofbigquery.table.Rowobjects.job_config: Dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected).project: The project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials.result_transformer: Function that can be passed to transform the result of a query before returning. The function will be passed the list of rows returned by BigQuery for the given query.location: Location of the dataset that will be queried.
- A list of rows, or pandas DataFrame if to_dataframe,
- matching the query criteria.
bigquery_query
query: String of the query to execute.gcp_credentials: Credentials to use for authentication with GCP.query_params: List of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted.dry_run_max_bytes: If provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising aValueErrorif the maximum is exceeded.dataset: Name of a destination dataset to write the query results to, if you don’t want them returned; if provided,tablemust also be provided.table: Name of a destination table to write the query results to, if you don’t want them returned; if provided,datasetmust also be provided.to_dataframe: If provided, returns the results of the query as a pandas dataframe instead of a list ofbigquery.table.Rowobjects.job_config: Dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected).project: The project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials.result_transformer: Function that can be passed to transform the result of a query before returning. The function will be passed the list of rows returned by BigQuery for the given query.location: Location of the dataset that will be queried.
- A list of rows, or pandas DataFrame if to_dataframe,
- matching the query criteria.
abigquery_create_table
bigquery.TimePartitioning object specifying a partitioning
of the newly created table
project: Project to initialize the BigQuery Client with; if
not provided, will default to the one inferred from your credentials.
location: The location of the dataset that will be written to.
external_config: The external data source. # noqa
Returns:
Table name.
Example:
bigquery_create_table
bigquery.TimePartitioning object specifying a partitioning
of the newly created table
project: Project to initialize the BigQuery Client with; if
not provided, will default to the one inferred from your credentials.
location: The location of the dataset that will be written to.
external_config: The external data source. # noqa
Returns:
Table name.
Example:
abigquery_insert_stream
dataset: Name of a dataset where the records will be written to.table: Name of a table to write to.records: The list of records to insert as rows into the BigQuery table; each item in the list should be a dictionary whose keys correspond to columns in the table.gcp_credentials: Credentials to use for authentication with GCP.project: The project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials.location: Location of the dataset that will be written to.
- List of inserted rows.
bigquery_insert_stream
dataset: Name of a dataset where the records will be written to.table: Name of a table to write to.records: The list of records to insert as rows into the BigQuery table; each item in the list should be a dictionary whose keys correspond to columns in the table.gcp_credentials: Credentials to use for authentication with GCP.project: The project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials.location: Location of the dataset that will be written to.
- List of inserted rows.
abigquery_load_cloud_storage
- The response from
load_table_from_uri.
bigquery_load_cloud_storage
- The response from
load_table_from_uri.
abigquery_load_file
dataset: ID of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization.table: Name of a destination table to write the records to; if not provided here, will default to the one provided at initialization.path: A string or path-like object of the file to be loaded.gcp_credentials: Credentials to use for authentication with GCP.schema: Schema to use when creating the table.job_config: An optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected).rewind: if True, seek to the beginning of the file handle before reading the file.size: Number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.project: Project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials.location: location of the dataset that will be written to.
- The response from
load_table_from_file.
bigquery_load_file
dataset: ID of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization.table: Name of a destination table to write the records to; if not provided here, will default to the one provided at initialization.path: A string or path-like object of the file to be loaded.gcp_credentials: Credentials to use for authentication with GCP.schema: Schema to use when creating the table.job_config: An optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected).rewind: if True, seek to the beginning of the file handle before reading the file.size: Number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.project: Project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials.location: location of the dataset that will be written to.
- The response from
load_table_from_file.
Classes
BigQueryWarehouse
A block for querying a database with BigQuery.
Upon instantiating, a connection to BigQuery is established
and maintained for the life of the object until the close method is called.
It is recommended to use this block as a context manager, which will automatically
close the connection and its cursors when the context is exited.
It is also recommended that this block is loaded and consumed within a single task
or flow because if the block is passed across separate tasks and flows,
the state of the block’s connection and cursor could be lost.
Attributes:
gcp_credentials: The credentials to use to authenticate.fetch_size: The number of rows to fetch at a time when calling fetch_many. Note, this parameter is executed on the client side and is not passed to the database. To limit on the server side, add theLIMITclause, or the dialect’s equivalent clause, likeTOP, to the query.
aexecute
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.**execution_options: Additional options to pass toconnection.execute.
aexecute_many
operation: The SQL query or other operation to be executed.seq_of_parameters: The sequence of parameters for the operation.
afetch_all
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.**execution_options: Additional options to pass toconnection.execute.
- A list of tuples containing the data returned by the database, where each row is a tuple and each column is a value in the tuple.
afetch_many
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.size: The number of results to return; if None or 0, uses the value offetch_sizeconfigured on the block.**execution_options: Additional options to pass toconnection.execute.
- A list of tuples containing the data returned by the database, where each row is a tuple and each column is a value in the tuple.
afetch_one
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.**execution_options: Additional options to pass toconnection.execute.
- A tuple containing the data returned by the database, where each row is a tuple and each column is a value in the tuple.
block_initialization
close
execute
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.**execution_options: Additional options to pass toconnection.execute.
execute_many
operation: The SQL query or other operation to be executed.seq_of_parameters: The sequence of parameters for the operation.
fetch_all
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.**execution_options: Additional options to pass toconnection.execute.
- A list of tuples containing the data returned by the database, where each row is a tuple and each column is a value in the tuple.
fetch_many
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.size: The number of results to return; if None or 0, uses the value offetch_sizeconfigured on the block.**execution_options: Additional options to pass toconnection.execute.
- A list of tuples containing the data returned by the database, where each row is a tuple and each column is a value in the tuple.
fetch_one
operation: The SQL query or other operation to be executed.parameters: The parameters for the operation.**execution_options: Additional options to pass toconnection.execute.
- A tuple containing the data returned by the database, where each row is a tuple and each column is a value in the tuple.