未能在法兰克福推出AWS-EMR集群,但在弗吉尼亚州的N.

时间:2016-11-26 10:06:58

标签: java amazon-web-services amazon-ec2 amazon-emr

我试图通过Java SDK for AWS启动一个小型EMR集群。我在法兰克福(eu-central-1)尝试并失败,但在N. Virginia(us-east-1)推出它时取得了成功。

我的配置:

  • macOS 10.12.1
  • Java 1.8.102
  • AWS SDK for Java 1.11.60
  • Hadoop 2.7.3
  • IntelliJ 2016.2.4

我已经验证了以下内容:

  1. 我要求的实例类型(M1Medium)存在于两个地区。
  2. Hadoop版本I请求群集(2.7.3)是EMR版本(5.2.0)中存在的版本。
  3. 我有适当的IAM Roles来支持群集(默认群集 - EMR_EC2_DefaultRole& EMR_DefaultRole),它们显然可以正常工作,因为它们用于在N中启动群集弗吉尼亚州。
  4. 我有两个区域的EC2密钥对。
  5. 我已经验证EMR在两个地区都可用作服务。
  6. 我已经验证我使用两个地区的正确可用区域,并且这些区域是健康的,通过我的网络浏览器通过EC2仪表板。
  7. 对于每次群集尝试,我都使用同一区域的S3存储桶来输入,输出和EMR日志。
  8. 这是在法兰克福推出集群的代码:

    public static void main(String[] args) throws Exception {
        parseArgs(args);
    
        if (environment.equals("local")) {
            // Local machine, single node setup. Used in order to debug the M-R logic.
            String[] p1args = {"input", "output", environment};
            Phase1.main(p1args);
        } else {
            // EMR setup. This is the main intent of this app.
            AWSCredentials credentials = null;
            try {
                credentials = new ProfileCredentialsProvider().getCredentials();
            } catch (Exception e) {
                throw new AmazonClientException(
                        "Cannot load the credentials from the credential profiles file. " +
                                "Please make sure that your credentials file is at the correct " +
                                "location (~/.aws/credentials), and is in valid format.",
                        e);
            }
    
            AmazonElasticMapReduce mapReduce = new AmazonElasticMapReduceClient(credentials);
    
            HadoopJarStepConfig jarStep1 = new HadoopJarStepConfig()
                    .withJar("s3n://skill-finder-eu-central-1/jars/SkillFinder.jar")
                    .withMainClass("Phase1")
                    .withArgs("s3n://skill-finder-eu-central-1/input-10K", "s3n://skill-finder-eu-central-1/output-eu-central-1", environment);
    
            StepConfig step1Config = new StepConfig()
                    .withName("Phase 1")
                    .withHadoopJarStep(jarStep1)
                    .withActionOnFailure("TERMINATE_JOB_FLOW");
    
            JobFlowInstancesConfig instances = new JobFlowInstancesConfig()
                    .withInstanceCount(5)
                    .withMasterInstanceType(InstanceType.M1Medium.toString())
                    .withSlaveInstanceType(InstanceType.M1Medium.toString())
                    .withHadoopVersion("2.7.3")
                    .withEc2KeyName("AWS-EU-CENTRAL-1")
                    .withKeepJobFlowAliveWhenNoSteps(false)
                    .withPlacement(new PlacementType("eu-central-1a"));
    
            RunJobFlowRequest runFlowRequest = new RunJobFlowRequest()
                    .withName("skill-finder")
                    .withInstances(instances)
                    .withSteps(step1Config)
                    .withJobFlowRole("EMR_EC2_DefaultRole")
                    .withServiceRole("EMR_DefaultRole")
                    .withReleaseLabel("emr-5.2.0")
                    .withLogUri("s3n://skill-finder-eu-central-1/logs/")
                    .withBootstrapActions();
    
            System.out.println("Submitting the JobFlow Request to Amazon EMR and running it...");
            RunJobFlowResult runJobFlowResult = mapReduce.runJobFlow(runFlowRequest);
            String jobFlowId = runJobFlowResult.getJobFlowId();
            System.out.println("Ran job flow with id: " + jobFlowId);
        }
    
    }
    

    在N. Virginia启动时,我只需将eu-central-1替换为us-east-1

    这是一个例外:

    Exception in thread "main" com.amazonaws.services.elasticmapreduce.model.AmazonElasticMapReduceException: Specified Availability Zone is not supported. (Service: AmazonElasticMapReduce; Status Code: 400; Error Code: ValidationException; Request ID: 578db9ad-b3bf-11e6-9a57-5179acb16d3f)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1545)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1183)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:964)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:676)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:650)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:633)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:601)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:583)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:447)
    at com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient.doInvoke(AmazonElasticMapReduceClient.java:1469)
    at com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient.invoke(AmazonElasticMapReduceClient.java:1445)
    at com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient.runJobFlow(AmazonElasticMapReduceClient.java:1255)
    at MRTaskLauncher.main(MRTaskLauncher.java:97)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
    

1 个答案:

答案 0 :(得分:0)

好的,找到了解决方案:我使用M3Xlarge个实例而不是M2Medium启动了群集。像魅力一样工作!

我是如何做到的:

  1. 由于我设法使用EMR的默认IAM角色在N. Virginia中启动集群,因此我开始认为我的身份验证可能存在问题。当我设法通过CLI在法兰克福启动集群时,进一步支持了这一点(在Create and Use IAM Roles with the AWS CLI下找到了示例here)。
  2. 我接下来要做的是尝试通过SDK重新启动群集。群集失败,但我复制了启动命令,因此我可以通过CLI启动。为此,我单击EMR群集列表(Web界面)中的群集,单击View cluster details,然后单击顶行AWS CLI export上的按钮。
  3. 令我惊讶的是,CLI提供了更具体的错误消息(与Web界面相比,它列出了验证错误),表明罪魁祸首是实例类型!然后我检查here以找出法兰克福可用的实例,并选择一个不需要VPC的实例(M4需要它),因为我没有精力开始搞乱那些东西。
  4. 一些前奏 - 列出的验证错误导致我找到this。正是这个问题促使我研究了默认IAM角色的问题,并尝试使用CLI。