Question

我正在尝试从该作业的脚本访问AWS ETL Glue作业ID。这是您可以在AWS Glue控制台的第一列中看到的RunID，类似于jr_5fc6d4ecf0248150067f2。如何使用pyspark以编程方式获取它？

Answer 1

我还没有在任何地方找到这些文档，但它作为命令行参数传入。

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['JOB_NAME'])
job_run_id = args['JOB_RUN_ID']

Answer 2

您可以使用boto3 SDK for python访问AWS服务

import boto3

def lambda_handler(event, context):
    client = boto3.client('glue')
    client.start_crawler(Name='test_crawler')
    glue = boto3.client(service_name='glue', region_name='us-east-2',
              endpoint_url='https://glue.us-east-2.amazonaws.com')

    myNewJobRun = client.start_job_run(JobName=myJob['Name'])
    print myNewJobRun['JobRunId']

AWS Glue：使用pyspark从脚本中获取job_id

2 个答案: