我正在尝试在应运行Hive脚本(查询)的AWS Java SDK上启动AWS EMR集群。
启动实例可以正常工作,但是一旦群集执行第一个作业,它就会失败,因为它无法访问S3存储桶(无论出于何种原因)。
错误:
2018-08-23T08:36:14.656Z INFO Ensure step 1 jar file s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
INFO Failed to download: s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
java.lang.RuntimeException: Error whilst fetching 's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'
at aws157.instancecontroller.util.S3Wrapper.fetchS3HadoopFileToLocal(S3Wrapper.java:412)
at aws157.instancecontroller.util.S3Wrapper.fetchHadoopFileToLocal(S3Wrapper.java:351)
at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner$Runner.<init>(HadoopJarStepRunner.java:243)
at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:152)
at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:146)
at aws157.instancecontroller.master.steprunner.StepExecutor.runStep(StepExecutor.java:136)
at aws157.instancecontroller.master.steprunner.StepExecutor.run(StepExecutor.java:70)
at aws157.instancecontroller.master.steprunner.StepExecutionManager.enqueueStep(StepExecutionManager.java:248)
at aws157.instancecontroller.master.steprunner.StepExecutionManager.doRun(StepExecutionManager.java:195)
at aws157.instancecontroller.master.steprunner.StepExecutionManager.access$000(StepExecutionManager.java:33)
at aws157.instancecontroller.master.steprunner.StepExecutionManager$1.run(StepExecutionManager.java:94)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect; Request ID: D47DD597E69F8F57), S3 Extended Request ID: 1PfC3L5vlWlFdMvC8YOEcx8+XCxb9O4P/9d9F2Oh0beDVtDWq6ey5Uuf5voXy8Q66HLG4V2xlaw=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1389)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:902)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1143)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1021)
at aws157.instancecontroller.util.S3Wrapper.copyS3ObjectToFile(S3Wrapper.java:303)
at aws157.instancecontroller.util.S3Wrapper.getFile(S3Wrapper.java:291)
at aws157.instancecontroller.util.S3Wrapper.fetchS3HadoopFileToLocal(S3Wrapper.java:399)
... 10 more
代码:
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.*;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;
public class Main {
private static final int INSTANCE_COUNT = 2;
private static final String INSTANCE_TYPE = "m4.large";
private static final String SUBNET_ID = "subnet-003e3d9762f04c021";
private static AmazonElasticMapReduce emr;
public void startApp() {
try {
init();
runCluster();
} catch (Exception e) {
e.printStackTrace();
}
}
/*
Initializes EMR Client
*/
private static void init() {
// Gets the EMR Client out the AWS IAM Credentials that are set as Environment Variables
emr = AmazonElasticMapReduceClientBuilder.defaultClient();
}
/*
Configures EMR Cluster
*/
private static JobFlowInstancesConfig configInstance() throws Exception {
return new JobFlowInstancesConfig()
.withInstanceCount(INSTANCE_COUNT) // 2
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType(INSTANCE_TYPE) //"m4.large"
.withSlaveInstanceType(INSTANCE_TYPE)
.withEc2SubnetId(SUBNET_ID); // Public Subnet with Security Group that allows all inbound and outbound traffic
}
private static void runCluster() throws Exception {
StepFactory stepFactory = new StepFactory();
StepConfig enableDebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure(ActionOnFailure.TERMINATE_JOB_FLOW)
.withHadoopJarStep(stepFactory.newEnableDebuggingStep()); // Fails here
StepConfig installHive = new StepConfig()
.withName("Install Hive")
.withActionOnFailure(ActionOnFailure.TERMINATE_JOB_FLOW)
.withHadoopJarStep(stepFactory.newInstallHiveStep());
StepConfig createTablesScript = new StepConfig()
.withName("Create Hive Tables from DynamoDB")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newRunHiveScriptStep("../hive/tables/createTables.sql"));
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("Hive Interactive")
.withReleaseLabel("emr-5.16.0")
.withSteps(enableDebugging, installHive, createTablesScript)
.withLogUri("s3://awsbigdatademo-logs-alexandermiller/logs") // S3 Bucket for Logs
.withServiceRole("EMR_DefaultRole")
.withJobFlowRole("EMR_EC2_DefaultRole")
.withInstances(configInstance());
RunJobFlowResult result = emr.runJobFlow(request);
System.out.println("JobFlowId: " + result.getJobFlowId()); // Prints correct ID
}
}
有人知道集群为何仍无法访问公共S3存储桶吗?可能是因为存在问题,所以EMR群集位于其他区域吗?
亚历克斯