拆除Google Cloud集群时,AWS StepFunctions任务状态将被取消

时间:2018-07-10 17:53:30

标签: amazon-web-services aws-lambda aws-step-functions

我正在使用AWS StepFunctions在Google Cloud方面执行多项任务-创建Dataproc集群,向其提交任务,然后将其拆除(每个都有其自己的Task状态,以及“轮询器”检查作业何时完成以移至下一个任务的任务)。

问题是,为了拆除群集,任务进入了“已取消”(灰色)状态,而不是“进行中”,随后是轮询器任务。一旦群集删除lambda函数执行了群集删除方法,它应该继续进行轮询器Task。

这里是集群删除lambda函数的一个例子:

from pprint import pprint
from google.cloud import storage
import googleapiclient.discovery
from rkstr8.cloud.google import GoogleCloudLambdaAuth
import time

def handler(event, context):

    creds = event['GCP_creds']
    GoogleCloudLambdaAuth(creds).configure_google_creds()

    dataproc = googleapiclient.discovery.build('dataproc', 'v1')
    project_id = event['gcp-administrative']['project']
    zone = event['gcp-administrative']['zone']
    try:
        region_as_list = zone.split('-')[:-1]
        region = '-'.join(region_as_list)
    except (AttributeError, IndexError, ValueError):
        raise ValueError('Invalid zone provided, please check your input.')
    cluster = event['dataproc-administrative']['cluster_name']

    print('Tearing down cluster...')
    request = dataproc.projects().regions().clusters().delete(
        projectId=project_id,
        region=region,
        clusterName=cluster)

    time.sleep(30)

    result = request.execute()

    return result

状态机构建代码的相关部分如下所示:

    dproc_submit_state = AsyncPoller(
                    stats_path=DPROC_SUBMIT_POLLER_STATUS_PATH,
                    async_task=Task(
                        name=DPROC_SUBMIT,
                        resource=DPROC_SUBMIT_ARN_VAR,
                        input_path=DPROC_SUBMIT_INPUT_PATH,
                        result_path=DPROC_SUBMIT_RESULT_PATH,
                        next=DPROC_SUBMIT_POLLER
                    ),
                    pollr_task=Task(
                        name=DPROC_SUBMIT_POLLER,
                        resource=DPROC_SUBMIT_POLLER_ARN_VAR,
                        input_path=DPROC_SUBMIT_RESULT_PATH,
                        result_path=DPROC_SUBMIT_POLLER_STATUS_PATH
                    ),
                    faild_task=Fail(
                        name='HailScriptFailed'
                    ),
                    succd_task=DPROC_DELETE,
                    pollr_wait_time=self.conf["POLLER_WAIT_TIME"]
                    ).states()
    dproc_delete_state = AsyncPoller(
                    stats_path=DPROC_DELETE_POLLER_STATUS_PATH,
                    async_task=Task(
                        name=DPROC_DELETE,
                        resource=DPROC_DELETE_ARN_VAR,
                        input_path=DPROC_DELETE_INPUT_PATH,
                        result_path=DPROC_DELETE_RESULT_PATH,
                        next=DPROC_DELETE_POLLER
                    ),
                    pollr_task=Task(
                        name=DPROC_DELETE_POLLER,
                        resource=DPROC_DELETE_POLLER_ARN_VAR,
                        input_path=DPROC_DELETE_RESULT_PATH,
                        result_path=DPROC_DELETE_POLLER_STATUS_PATH
                    ),
                    faild_task=Fail(
                        name='ClusterDeleteFailed'
                    ),
                    succd_task='PipelineSucceeded',
                    pollr_wait_time=self.conf["POLLER_WAIT_TIME"]
                    ).states()

这是状态机的样子:

enter image description here

1 个答案:

答案 0 :(得分:1)

为什么在创建请求和执行请求之间要睡30秒钟?

lambda的默认超时为3秒。我的猜测是您的lambda即将超时。