如何使用Boto3等待AWS EMR集群中的步骤完成

时间:2017-06-28 09:10:13

标签: amazon-web-services boto boto3 emr amazon-emr

鉴于步骤ID,我想等待AWS EMR步骤完成。我怎样才能做到这一点?有内置功能吗?

在撰写本文时,Boto3 Waiters for EMR允许等待群集运行和群集终止事件:

EMR Waiters

4 个答案:

答案 0 :(得分:4)

Boto3中没有内置功能。但是你可以写自己的服务员。

请参阅:describe_step

使用describe_stepcluster_id致电step_id。响应是一个字典,其中包含有关该步骤的详细信息。其中一个关键是' State'有关于步骤状态的信息。如果状态未完成,请等待几秒钟再试一次,直到它完成或等待时间超过您的限制。

'State': 'PENDING'|'CANCEL_PENDING'|'RUNNING'|'COMPLETED'|'CANCELLED'|'FAILED'|'INTERRUPTED'

答案 1 :(得分:4)

我提出了以下代码(如果将max_attempts设置为0或更小,那么它将等待直到没有正在运行/挂起的步骤):

def wait_for_steps_completion(emr_client, emr_cluster_id, max_attempts=0):
    sleep_seconds = 30
    num_attempts = 0

    while True:
        response = emr_client.list_steps(
            ClusterId=emr_cluster_id,
            StepStates=['PENDING', 'CANCEL_PENDING', 'RUNNING']
        )
        num_attempts += 1
        active_aws_emr_steps = response['Steps']

        if active_aws_emr_steps:
            if 0 < max_attempts <= num_attempts:
                raise Exception(
                    'Max attempts exceeded while waiting for AWS EMR steps completion. Last response:\n'
                    + json.dumps(response, indent=3, default=str)
                )
            time.sleep(sleep_seconds)
        else:
            return

答案 2 :(得分:4)

现在有一个服务员可用于步骤完成事件。它是在最近的boto3版本中添加的。

http://boto3.readthedocs.io/en/latest/reference/services/emr.html#EMR.Waiter.StepComplete

示例代码:

import boto3

client = boto3.client("emr")
waiter = client.get_waiter("step_complete")
waiter.wait(
    ClusterId='the-cluster-id',
    StepId='the-step-id',
    WaiterConfig={
        "Delay": 30,
        "MaxAttempts": 10
    }
)

答案 3 :(得分:0)

我在GitHub上编写了一个通用的status_poller函数作为EMR交互式演示的一部分。

status_poller函数循环并调用一个函数,打印“。”。或新状态,直到返回指定状态:

def status_poller(intro, done_status, func):
    """
    Polls a function for status, sleeping for 10 seconds between each query,
    until the specified status is returned.
    :param intro: An introductory sentence that informs the reader what we're
                  waiting for.
    :param done_status: The status we're waiting for. This function polls the status
                        function until it returns the specified status.
    :param func: The function to poll for status. This function must eventually
                 return the expected done_status or polling will continue indefinitely.
    """
    status = None
    print(intro)
    print("Current status: ", end='')
    while status != done_status:
        prev_status = status
        status = func()
        if prev_status == status:
            print('.', end='')
        else:
            print(status, end='')
        sys.stdout.flush()
        time.sleep(10)
    print()

要检查步骤是否完成,您可以这样称呼它:

status_poller(
    "Waiting for step to complete...",
    'COMPLETED',
    lambda:
    emr_basics.describe_step(cluster_id, step_id, emr_client)['Status']['State'])