Question

最近我一直在使用亚马逊网络服务（AWS），我注意到关于这个主题的文档不多，所以我添加了我的解决方案。

我正在使用Amazon Elastic MapReduce（Amazon EMR）编写应用程序。计算结束后，我需要对它们创建的文件执行一些工作，所以我需要知道作业流何时完成它的工作。

这是您检查工作流程是否完成的方法：

AmazonElasticMapReduce mapReduce = new AmazonElasticMapReduceClient(credentials);

DescribeJobFlowsRequest jobAttributes = new DescribeJobFlowsRequest()
    .withJobFlowStates("COMPLETED");

List<JobFlowDetail> jobs = mapReduce.describeJobFlows(jobAttributes).getJobFlows();
JobFlowDetail detail = jobs.get(0);

detail.getJobFlowId(); //the id of one of the completed jobs

您还可以在DescribeJobFlowsRequest中查找特定的职位ID，然后检查该职位是否已完成失败。

我希望它会帮助别人。

Answer 1

我也遇到了这个问题，这就是我现在提出的解决方案。它并不完美，但希望它会有所帮助。作为参考，我使用的是Java 1.7和AWS Java SDK 1.9.13版。

请注意，此代码假定您正在等待群集终止，而不是步骤严格说话;如果您的群集在完成所有步骤后终止，那么这是正常的，但是如果您使用的群集在步骤完成后保持活跃状态，则无法帮助您。

另请注意，此代码会监视并记录群集状态更改，另外还会诊断群集是否因错误而终止，如果发生异常则会引发异常。

private void yourMainMethod() {
    RunJobFlowRequest request = ...;

    try {
        RunJobFlowResult submission = emr.runJobFlow(request);
        String jobFlowId = submission.getJobFlowId();
        log.info("Submitted EMR job as job flow id {}", jobFlowId);

        DescribeClusterResult result = 
            waitForCompletion(emr, jobFlowId, 90, TimeUnit.SECONDS);
        diagnoseClusterResult(result, jobFlowId);
    } finally {
        emr.shutdown();
    }
}

private DescribeClusterResult waitForCompletion(
             AmazonElasticMapReduceClient emr, String jobFlowId,
             long sleepTime, TimeUnit timeUnit)
        throws InterruptedException {
    String state = "STARTING";
    while (true) {
        DescribeClusterResult result = emr.describeCluster(
                new DescribeClusterRequest().withClusterId(jobFlowId)
        );
        ClusterStatus status = result.getCluster().getStatus();
        String newState = status.getState();
        if (!state.equals(newState)) {
            log.info("Cluster id {} switched from {} to {}.  Reason: {}.",
                     jobFlowId, state, newState, status.getStateChangeReason());
            state = newState;
        }

        switch (state) {
            case "TERMINATED":
            case "TERMINATED_WITH_ERRORS":
            case "WAITING":
                return result;
        }

        timeUnit.sleep(sleepTime);
    }
}

private void diagnoseClusterResult(DescribeClusterResult result, String jobFlowId) {
    ClusterStatus status = result.getCluster().getStatus();
    ClusterStateChangeReason reason = status.getStateChangeReason();
    ClusterStateChangeReasonCode code = 
        ClusterStateChangeReasonCode.fromValue(reason.getCode());
    switch (code) {
    case ALL_STEPS_COMPLETED:
        log.info("Completed EMR job {}", jobFlowId);
        break;
    default:
        failEMR(jobFlowId, status);
    }
}

private static void failEMR(String jobFlowId, ClusterStatus status) {
    String msg = "EMR cluster run %s terminated with errors.  ClusterStatus = %s";
    throw new RuntimeException(String.format(msg, jobFlowId, status));
}

Answer 2

作业流程完成后，群集将停止，HDFS分区将丢失。为了防止数据丢失，请配置作业流程的最后一步，以便将结果存储在Amazon S3中。

如果JobFlowInstancesDetail：KeepJobFlowAliveWhenNoSteps参数设置为TRUE，则作业流程将为转换到WAITING状态，而不是在步骤完成后关闭。

每个作业流程最多允许256步。

如果您的工作很耗时，我建议您定期存储结果。

长话短说：没有办法知道什么时候完成。相反，您需要将数据保存为工作的一部分。

Answer 3

创建作业流程时使用--wait-for-steps选项。

./elastic-mapreduce --create \
...
 --wait-for-steps \
...

如何在Java应用程序中等待Elastic MapReduce作业流程的完成？

3 个答案: