prefect-dbt
With prefect-dbt
, you can trigger and observe dbt Cloud jobs, execute dbt Core CLI commands, and incorporate other tools, such as Snowflake, into your dbt runs.
Prefect provides a global view of the state of your workflows and allows you to take action based on state changes.
Prefect integrations may provide pre-built blocks, flows, or tasks for interacting with external systems. Block types in this library allow you to do things such as run a dbt Cloud job or execute a dbt Core command.
Getting started
Prerequisites
- A dbt Cloud account if using dbt Cloud.
Install prefect-dbt
The following command will install a version of prefect-dbt
compatible with your installed version of prefect
.
If you don’t already have prefect
installed, it will install the newest version of prefect
as well.
Upgrade to the latest versions of prefect
and prefect-dbt
:
If necessary, see additional installation options for dbt Core with BigQuery, Snowflake, and Postgres.
Register newly installed blocks types
Register the block types in the prefect-dbt
module to make them available for use.
dbt Cloud
If you have an existing dbt Cloud job, use the pre-built flow run_dbt_cloud_job
to trigger a job run and wait until the job run is finished. If some nodes fail, run_dbt_cloud_job
can efficiently retry the unsuccessful nodes. Prior to running this flow, save your dbt Cloud credentials to a DbtCloudCredentials block and create a dbt Cloud Job block:
Save dbt Cloud credentials to a block
Blocks can be created through code or through the UI.
To create a dbt Cloud Credentials block:
- Log into your dbt Cloud account.
- Click API Tokens on the sidebar.
- Copy a Service Token.
- Copy the account ID from the URL:
https://cloud.getdbt.com/settings/accounts/<ACCOUNT_ID>
. - Create and run the following script, replacing the placeholders:
Create a dbt Cloud job block
- In dbt Cloud, click on Deploy -> Jobs.
- Select a job.
- Copy the job ID from the URL:
https://cloud.getdbt.com/deploy/<ACCOUNT_ID>/projects/<PROJECT_ID>/jobs/<JOB_ID>
- Create and run the following script, replacing the placeholders.
Run a dbt Cloud job and wait for completion
dbt Core
prefect-dbt 0.7.0 and later
During the prerelease phase of prefect-dbt==0.7.0
, use the --pre
flag to install the latest release candidate.
Versions 0.7.0 and later of prefect-dbt
include the PrefectDbtRunner
class, which provides an improved interface for running dbt Core commands with better logging, failure handling, and additional features.
The PrefectDbtRunner
is inspired by the DbtRunner
from dbt Core, and its invoke
method accepts the same arguments.
Refer to the DbtRunner
documentation for more information on how to call invoke
.
Basic usage:
dbt settings
The PrefectDbtSettings
class, based on Pydantic’s BaseSettings
class, automatically detects common DBT_
-prefixed environment variables, like DBT_PROFILES_DIR
and DBT_PROJECT_DIR
.
If no environment variables are set, dbt’s defaults are used.
Provide a PrefectDbtSettings
instance to PrefectDbtRunner
to customize dbt settings or override environment variables.
profiles.yml
templating
The PrefectDbtRunner
class supports templating in your profiles.yml
file, allowing you to reference Prefect blocks and variables that will be resolved at runtime.
This enables you to store sensitive credentials securely using Prefect blocks, and configure different targets based on the Prefect workspace.
For example, a Prefect variable called target
can have a different value in development (dev
) and production (prod
) workspaces.
This allows you to use the same profiles.yml
file to automatically reference a local DuckDB instance in development and a Snowflake instance in production.
Logging
The PrefectDbtRunner
class maps all dbt log levels to standard Python logging levels, so filtering for log levels like WARNING
or ERROR
in the Prefect UI applies to dbt’s logs.
It also selects the appropriate logging target based on the invocation context: logging to a Prefect run logger when executed within a flow or task, and logging directly to the terminal when run outside of a flow or task.
Failure handling
The PrefectDbtRunner
class raises exceptions with standardized messages containing detailed information about the failure.
It also offers a raise_on_failure
option to control whether non-exception dbt failures, like failed tests, are raised as exceptions.
Events
The PrefectDbtRunner
automatically emits Prefect events for completed dbt nodes with status information.
An event for a skipped model may look like this, for example:
These events contain both the node_info
and run_result
for processed nodes in your dbt project, as well as the final status of the node’s execution in the primary resource.
This enables you to set up alerting for specific node states through Prefect’s automations system.
To be notified when this model is skipped, the following automation trigger could be set up:
Resources and lineage
Resources in Prefect Cloud are currently experienmental. Enable them by setting the PREFECT_EXPERIMENTS_LINEAGE_EVENTS_ENABLED
environment variable to true
wherever flow runs are executed.
The PrefectDbtRunner
class automatically emits lineage events describing the relationships between the resources in your dbt project.
Resources can be viewed in Prefect Cloud on the flow run page.
Click “View graph” to display a lineage graph with nodes directly upstream and downstream of the selected resource.
When running in Prefect Cloud, the lineage graph for executed nodes will be visible on the flow run page, showing upstream and downstream dependencies.
To describe resource that exist outside of your dbt project and are upstream of a given node, supply the resource in your dbt config. For example, to describe a BigQuery table as an upstream resource of a dbt model, add the following to your model’s config:
Native dbt configuration
You can disable these features for all resources in your dbt project config, or for specific resources in their own config:
prefect-dbt 0.6.6 and earlier
prefect-dbt
supports a couple of ways to run dbt Core commands.
A DbtCoreOperation
block will run the commands as shell commands, while other tasks use dbt’s Programmatic Invocation.
Optionally, specify the project_dir
.
If profiles_dir
is not set, the DBT_PROFILES_DIR
environment variable will be used.
If DBT_PROFILES_DIR
is not set, the default directory will be used $HOME/.dbt/
.
Use an existing profile
If you have an existing dbt profiles.yml
file, specify the profiles_dir
where the file is located:
If you are already using Prefect blocks such as the Snowflake Connector block, you can use those blocks to create a new profiles.yml
with a DbtCliProfile
block.
Use environment variables with Prefect secret blocks
If you use environment variables in profiles.yml
, set a Prefect Secret block as an environment variable:
This example profiles.yml
file could then access that variable.
Create a new profiles.yml
file with blocks
If you don’t have a profiles.yml
file, you can use a DbtCliProfile block to create profiles.yml
.
Then, specify profiles_dir
where profiles.yml
will be written.
Here’s example code with placeholders:
Supplying the dbt_cli_profile
argument will overwrite existing profiles.yml
files
If you already have a profiles.yml
file in the specified profiles_dir
, the file will be overwritten. If you do not specify a profiles directory, profiles.yml
at ~/.dbt/
would be overwritten.
Visit the SDK reference in the side navigation to see other built-in TargetConfigs
blocks.
If the desired service profile is not available, you can build one from the generic TargetConfigs
class.
Programmatic Invocation
prefect-dbt
has some pre-built tasks that use dbt’s programmatic invocation.
For example:
See the SDK docs for other pre-built tasks.
Create a summary artifact
These pre-built tasks can also create artifacts. These artifacts have extra information about dbt Core runs, such as messages and compiled code for nodes that fail or have errors.
BigQuery CLI profile block example
To create dbt Core target config and profile blocks for BigQuery:
- Save and load a
GcpCredentials
block. - Determine the schema / dataset you want to use in BigQuery.
- Create a short script, replacing the placeholders.
To create a dbt Core operation block:
- Determine the dbt commands you want to run.
- Create a short script, replacing the placeholders.
Load the saved block that holds your credentials:
Resources
For assistance using dbt, consult the dbt documentation.
Refer to the prefect-dbt
SDK documentation to explore all the capabilities of the prefect-dbt
library.
Additional installation options
Additional installation options for dbt Core with BigQuery, Snowflake, and Postgres are shown below.
Additional capabilities for dbt Core and Snowflake profiles
First install the main library compatible with your Prefect version:
Then install the additional capabilities you need.
Additional capabilities for dbt Core and BigQuery profiles
Additional capabilities for dbt Core and Postgres profiles
Or, install all of the extras.