prefect_databricks.jobs
This is a module containing tasks for interacting with:
Databricks jobs
Functions
jobs_runs_export
run_id: The canonical identifier for the run. This field is required.databricks_credentials: Credentials to use for authentication with Databricks.views_to_export: Which views to export (CODE, DASHBOARDS, or ALL). Defaults to CODE.
- Upon success, a dict of the response containing the key
views - (
List["models.ViewItem"]).
jobs_create
databricks_credentials: Credentials to use for authentication with Databricks.name: An optional name for the job.tags: A map of string key-value tags associated with the job, up to a maximum of 25. These are forwarded to jobs clusters as cluster tags.tasks: A list of task specifications (JobTaskSettings) to execute. Each task defines its key, the cluster it runs on, and the work to do (a notebook, JAR, Python, SQL, or dbt task).job_clusters: A list of shared job-cluster specifications (JobCluster) that tasks can reuse. Libraries must be declared per task, not on a shared cluster.email_notifications: Email addresses (JobEmailNotifications) to notify when runs of the job start, succeed, or fail.webhook_notifications: System notification IDs (WebhookNotifications) to call when runs of the job start, succeed, or fail, up to three destinations per event.timeout_seconds: An optional timeout, in seconds, applied to each run of the job. The default is no timeout.schedule: An optional periodic schedule (CronSchedule) expressed with Quartz cron syntax. By default the job runs only when triggered manually or through the API.max_concurrent_runs: The maximum number of concurrent runs allowed. Defaults to 1 and cannot exceed 1000; set to 0 to skip all new runs.git_source: An optional remote Git repository (GitSource) containing the notebooks used by the job’s notebook tasks. This functionality is in Public Preview.format: The format of the job. Ignored in create, update, and reset calls; alwaysMULTI_TASKon the Jobs API 2.1.access_control_list: A list of permissions (AccessControlRequest) to set on the job.parameters: Job-level parameter definitions (JobParameter).
- Upon success, a dict of the response containing the key
job_id - (
int).
jobs_delete
databricks_credentials: Credentials to use for authentication with Databricks.job_id: The canonical identifier of the job to delete. This field is required, e.g.11223344.
- Upon success, an empty dict.
jobs_get
job_id: The canonical identifier of the job to retrieve information about. This field is required.databricks_credentials: Credentials to use for authentication with Databricks.
- Upon success, a dict of the response containing the keys
job_id - (
int),creator_user_name(str),run_as_user_name(str), settings("models.JobSettings"), andcreated_time(int).
jobs_list
databricks_credentials: Credentials to use for authentication with Databricks.limit: The number of jobs to return. This value must be greater than 0 and less or equal to 25. The default value is 20.offset: The offset of the first job to return, relative to the most recently created job.name: A filter on the list based on the exact (case insensitive) job name.expand_tasks: Whether to include task and cluster details in the response.
- Upon success, a dict of the response containing the keys
jobs - (
List["models.Job"]) andhas_more(bool).
jobs_reset
databricks_credentials: Credentials to use for authentication with Databricks.job_id: The canonical identifier of the job to reset. Required.new_settings: The complete new settings (JobSettings) for the job. These entirely replace the job’s existing settings: changes totimeout_secondsapply to active runs, while changes to other fields apply only to future runs.
- Upon success, an empty dict.
jobs_run_now
run_id of the triggered run.
Args:
databricks_credentials: Credentials to use for authentication with Databricks.job_id: The ID of the job to run.idempotency_token: An optional token (at most 64 characters) guaranteeing the idempotency of the request. If a run with the token already exists, the existing run’s ID is returned instead of creating a new run. See the Databricks docs on job idempotency for details.jar_params: A list of command-line parameters for Spark JAR tasks, used to invoke the main class. Cannot be combined withnotebook_params, and its JSON representation cannot exceed 10,000 bytes.notebook_params: A map of key-value parameters for notebook tasks, accessible throughdbutils.widgets.get. Cannot be combined withjar_params, and its JSON representation cannot exceed 10,000 bytes.python_params: A list of command-line parameters for Python tasks. ASCII characters only; its JSON representation cannot exceed 10,000 bytes.spark_submit_params: A list of parameters passed to thespark-submitscript. ASCII characters only; its JSON representation cannot exceed 10,000 bytes.python_named_params: A map of named parameters for Python wheel tasks.pipeline_params: Parameters for Delta Live Tables pipeline tasks, such as whether to trigger a full refresh.sql_params: A map of key-value parameters for SQL tasks. SQL alert tasks do not support custom parameters.dbt_commands: A list of dbt commands to run for dbt tasks, for example["dbt deps", "dbt seed", "dbt run"].job_parameters: A map of job-level parameter overrides for the run.
- Upon success, a dict of the response containing the keys
run_id - (
int) andnumber_in_job(int).
jobs_runs_cancel
databricks_credentials: Credentials to use for authentication with Databricks.run_id: This field is required, e.g.455644833.
- Upon success, an empty dict.
jobs_runs_cancel_all
databricks_credentials: Credentials to use for authentication with Databricks.job_id: The canonical identifier of the job to cancel all runs of. This field is required, e.g.11223344.
- Upon success, an empty dict.
jobs_runs_delete
databricks_credentials: Credentials to use for authentication with Databricks.run_id: The canonical identifier of the run for which to retrieve the metadata, e.g.455644833.
- Upon success, an empty dict.
jobs_runs_get
run_id: The canonical identifier of the run for which to retrieve the metadata. This field is required.databricks_credentials: Credentials to use for authentication with Databricks.include_history: Whether to include the repair history in the response.
- Upon success, a dict describing the run, including its identifiers
- (
job_id,run_id),state,tasks, cluster spec and instance, - scheduling and timing fields, and — when requested — its
repair_history.
jobs_runs_get_output
run_id: The canonical identifier for the run. This field is required.databricks_credentials: Credentials to use for authentication with Databricks.
- Upon success, a dict of the response containing the keys
notebook_output("models.NotebookOutput"),sql_output("models.SqlOutput"),dbt_output("models.DbtOutput"),logs(str),logs_truncated(bool),error(str),error_trace(str), andmetadata("models.Run").
jobs_runs_list
databricks_credentials: Credentials to use for authentication with Databricks.active_only: If active_only istrue, only active runs are included in the results; otherwise, lists both active and completed runs. An active run is a run in thePENDING,RUNNING, orTERMINATING. This field cannot betruewhen completed_only istrue.completed_only: If completed_only istrue, only completed runs are included in the results; otherwise, lists both active and completed runs. This field cannot betruewhen active_only istrue.job_id: The job for which to list runs. If omitted, the Jobs service lists runs from all jobs.offset: The offset of the first run to return, relative to the most recent run.limit: The number of runs to return. This value must be greater than 0 and less than 25. The default value is 25. If a request specifies a limit of 0, the service instead uses the maximum limit.run_type: The type of runs to return. For a description of run types, see Run.expand_tasks: Whether to include task and cluster details in the response.start_time_from: Show runs that started at or after this value. The value must be a UTC timestamp in milliseconds. Can be combined with start_time_to to filter by a time range.start_time_to: Show runs that started at or before this value. The value must be a UTC timestamp in milliseconds. Can be combined with start_time_from to filter by a time range.
- Upon success, a dict of the response containing the keys
runs - (
List["models.Run"]) andhas_more(bool).
jobs_runs_repair
databricks_credentials: Credentials to use for authentication with Databricks.run_id: The job run ID of the run to repair. The run must not be in progress.rerun_tasks: The task keys of the task runs to repair.latest_repair_id: The ID of the latest repair. Not required when repairing a run for the first time, but must be provided on subsequent repair requests for the same run.rerun_all_failed_tasks: If true, repair all failed tasks. Only one ofrerun_tasksorrerun_all_failed_tasksmay be set.jar_params: A list of command-line parameters for Spark JAR tasks, used to invoke the main class. Cannot be combined withnotebook_params, and its JSON representation cannot exceed 10,000 bytes.notebook_params: A map of key-value parameters for notebook tasks, accessible throughdbutils.widgets.get. Cannot be combined withjar_params, and its JSON representation cannot exceed 10,000 bytes.python_params: A list of command-line parameters for Python tasks. ASCII characters only; its JSON representation cannot exceed 10,000 bytes.spark_submit_params: A list of parameters passed to thespark-submitscript. ASCII characters only; its JSON representation cannot exceed 10,000 bytes.python_named_params: A map of named parameters for Python wheel tasks.pipeline_params: Parameters for Delta Live Tables pipeline tasks, such as whether to trigger a full refresh.sql_params: A map of key-value parameters for SQL tasks. SQL alert tasks do not support custom parameters.dbt_commands: A list of dbt commands to run for dbt tasks, for example["dbt deps", "dbt seed", "dbt run"].job_parameters: A map of job-level parameter overrides for the run.
- Upon success, a dict of the response containing the key
repair_id - (
int).
jobs_runs_submit
jobs/runs/get API to check the run state
after the job is submitted.
Args:
databricks_credentials: Credentials to use for authentication with Databricks.tasks: A list of task specifications (RunSubmitTaskSettings) to run. Each task defines its key, the cluster it runs on, and the work to do (a notebook, JAR, Python, SQL, or dbt task).run_name: An optional name for the run. Defaults toUntitled.webhook_notifications: System notification IDs (WebhookNotifications) to call when the run starts, succeeds, or fails, up to three destinations per event.git_source: An optional remote Git repository (GitSource) containing the notebooks used by the run’s notebook tasks. This functionality is in Public Preview.timeout_seconds: An optional timeout, in seconds, applied to the run. The default is no timeout.idempotency_token: An optional token (at most 64 characters) guaranteeing the idempotency of the request. If a run with the token already exists, the existing run’s ID is returned instead of creating a new run. See the Databricks docs on job idempotency for details.access_control_list: A list of permissions (AccessControlRequest) to set on the run.
- Upon success, a dict of the response containing the key
run_id - (
int).
jobs_update
databricks_credentials: Credentials to use for authentication with Databricks.job_id: The canonical identifier of the job to update. Required.new_settings: The settings (JobSettings) to apply. Any top-level field provided is replaced entirely; partial updates of nested fields are not supported. Changes totimeout_secondsapply to active runs, while changes to other fields apply only to future runs.fields_to_remove: An optional list of top-level fields to remove from the job’s settings. Removing nested fields is not supported.
- Upon success, an empty dict.