在EMR作业流中指定其他用户拥有的S3存储桶

时间:2013-08-23 09:44:01

标签: amazon-web-services amazon-s3 elastic-map-reduce amazon-emr

我正在尝试使用S3存储桶作为我的Elastic Map Reduce作业流的输入数据。 S3存储桶与EMR作业流不属于同一帐户。如何以及在何处指定S3存储桶凭据以访问相应的S3存储桶。我尝试了以下格式:

s3n://<Access Key>:<Secret Key>@<BUCKET>

但它给了我以下错误:

Exception in thread "main" java.lang.IllegalArgumentException: The bucket name parameter must be specified when listing objects in a bucket
at com.amazonaws.services.s3.AmazonS3Client.assertParameterNotNull(AmazonS3Client.java:2381)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:444)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:785)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.ensureBucketExists(Jets3tNativeFileSystemStore.java:80)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:83)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.fs.s3native.$Proxy1.initialize(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:512)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:352)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:321)
at com.inmobi.appengage.emr.mapreduce.TestSession.main(TestSession.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

如何指定相同的内容?

1 个答案:

答案 0 :(得分:3)

您应该尝试将这些凭据添加到core-site.xml文件中。您可以在节点中手动添加s3凭据,也可以在启动群集时使用boostrap操作。

您可以使用以下内容启动群集:

  

ruby​​ elastic-mapreduce --create --alive --plain-output   --master-instance-type m1.xlarge --slave-instance-type m1.xlarge --num-instances 11 --name“My Super Cluster”--bootstrap-action s3:// elasticmapreduce / bootstrap-actions / configure -hadoop --args   -c,fs.s3.awsAccessKeyId = LT; access-key&gt;, - c,fs.s3.awsSecretAccessKey =&lt;秘密密钥&gt;

这应该覆盖EMR根据启动群集的帐户放置的默认值。