prefect_databricks.models.jobs
Classes
AutoScale
See source code for the fields’ description.
AwsAttributes
See source code for the fields’ description.
CanManage
Permission to manage the job.
CanManageRun
Permission to run and/or manage runs for the job.
CanView
Permission to view the settings of the job.
ClusterCloudProviderNodeStatus
- NotEnabledOnSubscription: Node type not available for subscription.
- NotAvailableInRegion: Node type not available in region.
ClusterEventType
CREATING: Indicates that the cluster is being created.DID_NOT_EXPAND_DISK: Indicates that a disk is low on space, but adding disks would put it over the max capacity.EXPANDED_DISK: Indicates that a disk was low on space and the disks were expanded.FAILED_TO_EXPAND_DISK: Indicates that a disk was low on space and disk space could not be expanded.INIT_SCRIPTS_STARTING: Indicates that the cluster scoped init script has started.INIT_SCRIPTS_FINISHED: Indicates that the cluster scoped init script has finished.STARTING: Indicates that the cluster is being started.RESTARTING: Indicates that the cluster is being started.TERMINATING: Indicates that the cluster is being terminated.EDITED: Indicates that the cluster has been edited.RUNNING: Indicates the cluster has finished being created. Includes the number of nodes in the cluster and a failure reason if some nodes could not be acquired.RESIZING: Indicates a change in the target size of the cluster (upsize or downsize).UPSIZE_COMPLETED: Indicates that nodes finished being added to the cluster. Includes the number of nodes in the cluster and a failure reason if some nodes could not be acquired.NODES_LOST: Indicates that some nodes were lost from the cluster.DRIVER_HEALTHY: Indicates that the driver is healthy and the cluster is ready for use.DRIVER_UNAVAILABLE: Indicates that the driver is unavailable.SPARK_EXCEPTION: Indicates that a Spark exception was thrown from the driver.DRIVER_NOT_RESPONDING: Indicates that the driver is up but is not responsive, likely due to GC.DBFS_DOWN: Indicates that the driver is up but DBFS is down.METASTORE_DOWN: Indicates that the driver is up but the metastore is down.NODE_BLACKLISTED: Indicates that a node is not allowed by Spark.PINNED: Indicates that the cluster was pinned.UNPINNED: Indicates that the cluster was unpinned.
ClusterInstance
See source code for the fields’ description.
ClusterSize
See source code for the fields’ description.
ClusterSource
- UI: Cluster created through the UI.
- JOB: Cluster created by the Databricks job scheduler.
- API: Cluster created through an API call.
ClusterState
- PENDING: Indicates that a cluster is in the process of being created.
- RUNNING: Indicates that a cluster has been started and is ready for use.
- RESTARTING: Indicates that a cluster is in the process of restarting.
- RESIZING: Indicates that a cluster is in the process of adding or removing nodes.
- TERMINATING: Indicates that a cluster is in the process of being destroyed.
- TERMINATED: Indicates that a cluster has been successfully destroyed.
- ERROR: This state is no longer used. It was used to indicate a cluster that failed to be created.
TERMINATINGandTERMINATEDare used instead. - UNKNOWN: Indicates that a cluster is in an unknown state. A cluster should never be in this state.
ClusterTag
See source code for the fields’ description.
An object with key value pairs. The key length must be between 1 and 127 UTF-8 characters, inclusive. The value length must be less than or equal to 255 UTF-8 characters. For a list of all restrictions, see AWS Tag Restrictions: <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions>
CronSchedule
See source code for the fields’ description.
DbfsStorageInfo
See source code for the fields’ description.
DbtOutput
See source code for the fields’ description.
DbtTask
See source code for the fields’ description.
DockerBasicAuth
See source code for the fields’ description.
DockerImage
See source code for the fields’ description.
Error
See source code for the fields’ description.
FileStorageInfo
See source code for the fields’ description.
GitSnapshot
See source code for the fields’ description.
Read-only state of the remote repository at the time the job was run. This field is only included on job runs.
GitSource
See source code for the fields’ description.
This functionality is in Public Preview.
An optional specification for a remote repository containing the notebooks used by this job’s notebook tasks.
GitSource1
See source code for the fields’ description.
GroupName
See source code for the fields’ description.
IsOwner
Perimssion that represents ownership of the job.
JobEmailNotifications
See source code for the fields’ description.
LibraryInstallStatus
PENDING: No action has yet been taken to install the library. This state should be very short lived.RESOLVING: Metadata necessary to install the library is being retrieved from the provided repository. For Jar, Egg, and Whl libraries, this step is a no-op.INSTALLING: The library is actively being installed, either by adding resources to Spark or executing system commands inside the Spark nodes.INSTALLED: The library has been successfully instally.SKIPPED: Installation on a Databricks Runtime 7.0 or above cluster was skipped due to Scala version incompatibility.FAILED: Some step in installation failed. More information can be found in the messages field.UNINSTALL_ON_RESTART: The library has been marked for removal. Libraries can be removed only when clusters are restarted, so libraries that enter this state remains until the cluster is restarted.
ListOrder
DESC: Descending order.ASC: Ascending order.
RuntimeEngine
Decides which runtime engine to be use, e.g. Standard vs. Photon. If unspecified, the runtime engine is inferred from spark_version.
LogSyncStatus
See source code for the fields’ description.
MavenLibrary
See source code for the fields’ description.
NotebookOutput
See source code for the fields’ description.
NotebookTask
See source code for the fields’ description.
ParameterPair
See source code for the fields’ description.
An object with additional information about why a cluster was terminated. The object keys are one of TerminationParameter and the value is the termination information.
PermissionLevel
See source code for the fields’ description.
PermissionLevelForGroup
See source code for the fields’ description.
PipelineTask
See source code for the fields’ description.
PoolClusterTerminationCode
- INSTANCE_POOL_MAX_CAPACITY_FAILURE: The pool max capacity has been reached.
- INSTANCE_POOL_NOT_FOUND_FAILURE: The pool specified by the cluster is no longer active or doesn’t exist.
PythonPyPiLibrary
See source code for the fields’ description.
PythonWheelTask
See source code for the fields’ description.
RCranLibrary
See source code for the fields’ description.
RepairRunInput
See source code for the fields’ description.
ResizeCause
AUTOSCALE: Automatically resized based on load.USER_REQUEST: User requested a new size.AUTORECOVERY: Autorecovery monitor resized the cluster after it lost a node.
RunLifeCycleState
PENDING: The run has been triggered. If there is not already an active run of the same job, the cluster and execution context are being prepared. If there is already an active run of the same job, the run immediately transitions into theSKIPPEDstate without preparing any resources.RUNNING: The task of this run is being executed.TERMINATING: The task of this run has completed, and the cluster and execution context are being cleaned up.TERMINATED: The task of this run has completed, and the cluster and execution context have been cleaned up. This state is terminal.SKIPPED: This run was aborted because a previous run of the same job was already active. This state is terminal.INTERNAL_ERROR: An exceptional state that indicates a failure in the Jobs service, such as network failure over a long period. If a run on a new cluster ends in theINTERNAL_ERRORstate, the Jobs service terminates the cluster as soon as possible. This state is terminal.BLOCKED: The run is blocked on an upstream dependency.WAITING_FOR_RETRY: The run is waiting for a retry.
RunNowInput
See source code for the fields’ description.
PipelineParams
See source code for the fields’ description.
RunParameters
See source code for the fields’ description.
RunResultState
SUCCESS: The task completed successfully.FAILED: The task completed with an error.TIMEDOUT: The run was stopped after reaching the timeout.CANCELED: The run was canceled at user request.
RunState
See source code for the fields’ description.
The result and lifecycle state of the run.
RunType
The type of the run.
JOB_RUN: Normal job run. A run created with Run now.WORKFLOW_RUN: Workflow run. A run created with dbutils.notebook.run.SUBMIT_RUN: Submit run. A run created with Run Submit.
S3StorageInfo
See source code for the fields’ description.
ServicePrincipalName
See source code for the fields’ description.
SparkConfPair
See source code for the fields’ description.
An arbitrary object where the object key is a configuration property name and the value is a configuration property value.
SparkEnvPair
See source code for the fields’ description.
An arbitrary object where the object key is an environment variable name and the value is an environment variable value.
SparkJarTask
See source code for the fields’ description.
SparkNodeAwsAttributes
See source code for the fields’ description.
SparkPythonTask
See source code for the fields’ description.
SparkSubmitTask
See source code for the fields’ description.
SparkVersion
See source code for the fields’ description.
SqlOutputError
See source code for the fields’ description.
SqlStatementOutput
See source code for the fields’ description.
SqlTaskAlert
See source code for the fields’ description.
SqlTaskDashboard
See source code for the fields’ description.
SqlTaskQuery
See source code for the fields’ description.
TaskDependency
See source code for the fields’ description.
TaskDependencies
See source code for the fields’ description.
An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete successfully before executing this task.
The key is task_key, and the value is the name assigned to the dependent task.
This field is required when a job consists of more than one task.
TaskDescription
See source code for the fields’ description.
TaskKey
See source code for the fields’ description.
TerminationCode
- USER_REQUEST: A user terminated the cluster directly. Parameters should include a
usernamefield that indicates the specific user who terminated the cluster. - JOB_FINISHED: The cluster was launched by a job, and terminated when the job completed.
- INACTIVITY: The cluster was terminated since it was idle.
- CLOUD_PROVIDER_SHUTDOWN: The instance that hosted the Spark driver was terminated by the cloud provider. In AWS, for example, AWS may retire instances and directly shut them down. Parameters should include an
aws_instance_state_reasonfield indicating the AWS-provided reason why the instance was terminated. - COMMUNICATION_LOST: Databricks lost connection to services on the driver instance. For example, this can happen when problems arise in cloud networking infrastructure, or when the instance itself becomes unhealthy.
- CLOUD_PROVIDER_LAUNCH_FAILURE: Databricks experienced a cloud provider failure when requesting instances to launch clusters. For example, AWS limits the number of running instances and EBS volumes. If you ask Databricks to launch a cluster that requires instances or EBS volumes that exceed your AWS limit, the cluster fails with this status code. Parameters should include one of
aws_api_error_code,aws_instance_state_reason, oraws_spot_request_statusto indicate the AWS-provided reason why Databricks could not request the required instances for the cluster. - SPARK_STARTUP_FAILURE: The cluster failed to initialize. Possible reasons may include failure to create the environment for Spark or issues launching the Spark master and worker processes.
- INVALID_ARGUMENT: Cannot launch the cluster because the user specified an invalid argument. For example, the user might specify an invalid runtime version for the cluster.
- UNEXPECTED_LAUNCH_FAILURE: While launching this cluster, Databricks failed to complete critical setup steps, terminating the cluster.
- INTERNAL_ERROR: Databricks encountered an unexpected error that forced the running cluster to be terminated. Contact Databricks support for additional details.
- SPARK_ERROR: The Spark driver failed to start. Possible reasons may include incompatible libraries and initialization scripts that corrupted the Spark container.
- METASTORE_COMPONENT_UNHEALTHY: The cluster failed to start because the external metastore could not be reached. Refer to Troubleshooting.
- DBFS_COMPONENT_UNHEALTHY: The cluster failed to start because Databricks File System (DBFS) could not be reached.
- DRIVER_UNREACHABLE: Databricks was not able to access the Spark driver, because it was not reachable.
- DRIVER_UNRESPONSIVE: Databricks was not able to access the Spark driver, because it was unresponsive.
- INSTANCE_UNREACHABLE: Databricks was not able to access instances in order to start the cluster. This can be a transient networking issue. If the problem persists, this usually indicates a networking environment misconfiguration.
- CONTAINER_LAUNCH_FAILURE: Databricks was unable to launch containers on worker nodes for the cluster. Have your admin check your network configuration.
- INSTANCE_POOL_CLUSTER_FAILURE: Pool backed cluster specific failure. Refer to Pools for details.
- REQUEST_REJECTED: Databricks cannot handle the request at this moment. Try again later and contact Databricks if the problem persists.
- INIT_SCRIPT_FAILURE: Databricks cannot load and run a cluster-scoped init script on one of the cluster’s nodes, or the init script terminates with a non-zero exit code. Refer to Init script logs.
- TRIAL_EXPIRED: The Databricks trial subscription expired.
TerminationParameter
See source code for the fields’ description.
TerminationType
- SUCCESS: Termination succeeded.
- CLIENT_ERROR: Non-retriable. Client must fix parameters before reattempting the cluster creation.
- SERVICE_FAULT: Databricks service issue. Client can retry.
- CLOUD_FAILURECloud provider infrastructure issue. Client can retry after the underlying issue is resolved.
TriggerType
PERIODIC: Schedules that periodically trigger runs, such as a cron scheduler.ONE_TIME: One time triggers that fire a single run. This occurs you triggered a single run on demand through the UI or the API.RETRY: Indicates a run that is triggered as a retry of a previously failed run. This occurs when you request to re-run the job in case of failures.
UserName
See source code for the fields’ description.
ViewType
NOTEBOOK: Notebook view item.DASHBOARD: Dashboard view item.
ViewsToExport
CODE: Code view of the notebook.DASHBOARDS: All dashboard views of the notebook.ALL: All views of the notebook.