如何获取Azure Databricks笔记本运行详细信息

时间:2020-10-06 06:31:18

标签: apache-spark pyspark azure-data-factory databricks azure-databricks

我正在使用Azure Data Factory运行我的databricks笔记本,该笔记本在运行时创建作业群集。现在,我想知道这些作业的状态,是指它们是成功还是失败。 所以我可能知道,如何使用作业ID或运行ID来获得运行状态。

注意:我尚未在databricks工作区中创建任何作业,我正在使用Azure Data Factory运行笔记本,Azure Data Factory在运行时创建了作业群集,它在该群集的顶部运行该笔记本,然后终止了该群集

2 个答案:

答案 0 :(得分:1)

import json
import requests

gethooks= "https://" + databricks_instance_name + "/api/2.0/jobs/runs/list"     #add your databricks workspace instance name over here
headers={"Authorization": "Bearer ********************"}        # Add your databricks access token
response = requests.get(gethooks, headers=headers)

print(response.json())      # you will get all cluster and job related info over here in json format

# traversing through response.json
for element in response.json()['runs']:
    job_id = element['job_id']
    status = element['state']['result_state']
    job_path = element['task']['notebook_task']['notebook_path']
    job_name = job_path.split('/')

答案 1 :(得分:0)

您必须转到Azure数据工厂中的监视器页面。您将可以在此处按runId进行过滤。

https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook#monitor-the-pipeline-run