公共S3存储桶上的AWS EMR集群(Java AWS开发工具包)AmazonS3Exception

时间:2018-08-23 09:03:19

标签: amazon-web-services amazon-s3 amazon-emr

我正在尝试在应运行Hive脚本(查询)的AWS Java SDK上启动AWS EMR集群。

启动实例可以正常工作,但是一旦群集执行第一个作业,它就会失败,因为它无法访问S3存储桶(无论出于何种原因)。

错误

2018-08-23T08:36:14.656Z INFO Ensure step 1 jar file s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
INFO Failed to download: s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
java.lang.RuntimeException: Error whilst fetching 's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'
    at aws157.instancecontroller.util.S3Wrapper.fetchS3HadoopFileToLocal(S3Wrapper.java:412)
    at aws157.instancecontroller.util.S3Wrapper.fetchHadoopFileToLocal(S3Wrapper.java:351)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner$Runner.<init>(HadoopJarStepRunner.java:243)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:152)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:146)
    at aws157.instancecontroller.master.steprunner.StepExecutor.runStep(StepExecutor.java:136)
    at aws157.instancecontroller.master.steprunner.StepExecutor.run(StepExecutor.java:70)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.enqueueStep(StepExecutionManager.java:248)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.doRun(StepExecutionManager.java:195)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.access$000(StepExecutionManager.java:33)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager$1.run(StepExecutionManager.java:94)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect; Request ID: D47DD597E69F8F57), S3 Extended Request ID: 1PfC3L5vlWlFdMvC8YOEcx8+XCxb9O4P/9d9F2Oh0beDVtDWq6ey5Uuf5voXy8Q66HLG4V2xlaw=
    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1389)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:902)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
    at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1143)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1021)
    at aws157.instancecontroller.util.S3Wrapper.copyS3ObjectToFile(S3Wrapper.java:303)
    at aws157.instancecontroller.util.S3Wrapper.getFile(S3Wrapper.java:291)
    at aws157.instancecontroller.util.S3Wrapper.fetchS3HadoopFileToLocal(S3Wrapper.java:399)
    ... 10 more

代码

import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.*;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;

    public class Main {
        private static final int INSTANCE_COUNT = 2;
        private static final String INSTANCE_TYPE = "m4.large";
        private static final String SUBNET_ID = "subnet-003e3d9762f04c021";
        private static AmazonElasticMapReduce emr;


        public void startApp() {
            try {
                init();
                runCluster();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

            /*
            Initializes EMR Client
             */
            private static void init() {
                // Gets the EMR Client out the AWS IAM Credentials that are set as Environment Variables
                emr = AmazonElasticMapReduceClientBuilder.defaultClient();
            }

            /*
            Configures EMR Cluster
             */
            private static JobFlowInstancesConfig configInstance() throws Exception {
                return new JobFlowInstancesConfig()
                        .withInstanceCount(INSTANCE_COUNT) // 2
                        .withKeepJobFlowAliveWhenNoSteps(true)
                        .withMasterInstanceType(INSTANCE_TYPE) //"m4.large"
                        .withSlaveInstanceType(INSTANCE_TYPE)
                        .withEc2SubnetId(SUBNET_ID); // Public Subnet with Security Group that allows all inbound and outbound traffic
            }

            private static void runCluster() throws Exception {
                StepFactory stepFactory = new StepFactory();

                StepConfig enableDebugging = new StepConfig()
                        .withName("Enable debugging")
                        .withActionOnFailure(ActionOnFailure.TERMINATE_JOB_FLOW)
                        .withHadoopJarStep(stepFactory.newEnableDebuggingStep()); // Fails here

                StepConfig installHive = new StepConfig()
                        .withName("Install Hive")
                        .withActionOnFailure(ActionOnFailure.TERMINATE_JOB_FLOW)
                        .withHadoopJarStep(stepFactory.newInstallHiveStep());

                StepConfig createTablesScript = new StepConfig()
                        .withName("Create Hive Tables from DynamoDB")
                        .withActionOnFailure("TERMINATE_JOB_FLOW")
                        .withHadoopJarStep(stepFactory.newRunHiveScriptStep("../hive/tables/createTables.sql"));


                RunJobFlowRequest request = new RunJobFlowRequest()
                        .withName("Hive Interactive")
                        .withReleaseLabel("emr-5.16.0")
                        .withSteps(enableDebugging, installHive, createTablesScript)
                        .withLogUri("s3://awsbigdatademo-logs-alexandermiller/logs") // S3 Bucket for Logs
                        .withServiceRole("EMR_DefaultRole")
                        .withJobFlowRole("EMR_EC2_DefaultRole")
                        .withInstances(configInstance());

                RunJobFlowResult result = emr.runJobFlow(request);

                System.out.println("JobFlowId: " + result.getJobFlowId()); // Prints correct ID
            }
        }
  • EMR群集位于eu-central-1中,并已成功启动,但是由于第一个作业失败,该群集将关闭。
  • IAM角色具有管理员访问策略

有人知道集群为何仍无法访问公共S3存储桶吗?可能是因为存在问题,所以EMR群集位于其他区域吗?

亚历克斯

0 个答案:

没有答案