配置EMR以使用s3a而不是s3进行spark.sql调用

时间:2018-12-21 11:14:39

标签: amazon-web-services apache-spark amazon-iam amazon-emr aws-iam

我对spark.sql(“”)的所有调用均因以下堆栈跟踪(1)中的错误而失败

更新-2 我已经解决了这个问题,它是sts:AssumeRule的AccessDenied,感谢任何潜在客户

set(CMAKE_CXX_STANDARD 11)

使用

访问相同位置时
User: arn:aws:sts::00000000000:assumed-role/EMR_EC2_XXXXX_XXXXXX_POLICY/i-3232131232131232 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::00000000000:role/EMR_XXXXXX_XXXXXX_POLICY

我能够读取记录。

但是,当使用 s3:而不是 s3a:方案访问时,相同的堆栈跟踪(1)重新浮出水面

spark.read.parquet("s3a://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")

所以我该如何在EMR上配置Spark以使用 s3a:或运行 s3:而不会拒绝访问,这是假定的,因为它可能未使用适当的凭据链

(1)

spark.read.parquet("s3://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")

更新-1 尝试设置机密和访问密钥不起作用

Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Access denied (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxx-xxxx-xxxx-xxxx-xxxxxxxx)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1658)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1322)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1072)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:745)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:719)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:701)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:669)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:651)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:515)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1369)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1338)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1327)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:488)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:460)

1 个答案:

答案 0 :(得分:0)

此堆栈跟踪显示“ amazon EMR S3客户端”;而不是Apache ASF,因此设置和错误消息不同。

关于“假定角色”的错误消息提示您正在EC2 VM中运行(是吗?),并且“假定角色”实际上是EC2 VM部署时使用的IAM角色。在这种情况下(a)没有其他凭据被获取,并且(b)VM没有访问角色的权限。修复:确定设置以获取凭据,增加EC2 IAM角色权限或创建具有不同角色的VM