prefect-databricks lets a Prefect flow drive Databricks Jobs: authenticate with a credentials block, trigger an existing job or submit a one-time notebook/JAR run, and wait for the run to finish and collect its output. The most common workflows are:
- Trigger an existing Databricks job and wait for it to complete — the flagship use case.
- Submit a one-time notebook (or JAR/Python) run on a new cluster and wait for its output.
- List and inspect jobs and runs with the lower-level task wrappers.
Getting started
Prerequisites
- A Databricks account and a workspace.
- Your workspace instance URL (for example
dbc-abc12345-6789.cloud.databricks.com), without thehttps://scheme. - Either a personal access token (PAT) or a service principal (
client_id/client_secret) with permission to run the jobs you target.
Install prefect-databricks
The following installs a version of prefect-databricks compatible with your installed version of prefect. If you don’t already have prefect installed, it installs the newest version as well.
Create a credentials block
Every workflow below loads aDatabricksCredentials block by name, so create one first. Construct it and call .save() to persist it to your Prefect API:
client_id and client_secret (and tenant_id for Azure Databricks):
Trigger an existing job and wait for it to complete
This is the most common workflow: kick off a job you’ve already defined in Databricks (by itsjob_id) and block until it finishes, polling along the way. jobs_runs_submit_by_id_and_wait_for_completion is an async flow, so run it with asyncio.run.
Submit a one-time notebook run and wait for its output
To run a notebook without first defining a job, submit a one-time run on a new cluster.jobs_runs_submit_and_wait_for_completion waits for completion and returns the notebook outputs keyed by task.
Given a notebook at /Users/you@example.com/example that reads a name widget:
AutoScale(min_workers=1, max_workers=2) is the same as {"min_workers": 1, "max_workers": 2}.
List and inspect jobs
For finer-grained control,prefect_databricks.jobs wraps individual Databricks Jobs REST endpoints as async tasks (jobs_list, jobs_get, jobs_runs_get, and more). Call them from within a flow:
Resources
For assistance using Databricks, consult the Databricks documentation. Refer to theprefect-databricks SDK reference for the full list of credentials options, flows, and job tasks.