spark无法从emr访问客户的s3存储桶

时间:2020-02-20 10:17:10

标签: amazon-web-services apache-spark

我正在努力连接到另一个帐户的s3存储桶。问题是我可以通过s3 API连接到存储桶

def listFilesAndRename(key : String) = {
    val clientBucket = "test"
    val credentials = new BasicAWSCredentials("access key", "secret key")
    val s3 = AmazonS3ClientBuilder.standard.withRegion(Regions.US_EAST_1).withCredentials(new AWSStaticCredentialsProvider(credentials)).build

  val result = s3.listObjectsV2(clientBucket, key)
  val objects = result.getObjectSummaries.toArray
  for {
    os <- objects
    fileName = os.asInstanceOf[S3ObjectSummary].getKey
    if !fileName.endsWith("_SUCCESS")
       } {
    System.out.println("* " + os.asInstanceOf[S3ObjectSummary].getKey)
  }
}

但是当我想使用spark连接时,它不起作用。我尝试了以下设置

spark.conf.set("spark.hadoop.fs.s3a.access.key", "access key")
spark.conf.set("spark.hadoop.fs.s3a.secret.key", "secret key")
spark.conf.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

原因:

com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 799F611A3D7BB5E1; S3 Extended Request ID: USTzJmBtWt27ccvF+YuTogwDSknuvB2tBLHxqPySLwftHXfmPpPYVjqVubzeVnqjJS89fzIjJw8=)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)

很明显,这不是连接错误,但似乎spark无法连接到s3存储桶

0 个答案:

没有答案