我对spark.sql(“”)的所有调用均因以下堆栈跟踪(1)中的错误而失败
更新-2 我已经解决了这个问题,它是sts:AssumeRule的AccessDenied,感谢任何潜在客户
set(CMAKE_CXX_STANDARD 11)
使用
访问相同位置时User: arn:aws:sts::00000000000:assumed-role/EMR_EC2_XXXXX_XXXXXX_POLICY/i-3232131232131232 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::00000000000:role/EMR_XXXXXX_XXXXXX_POLICY
我能够读取记录。
但是,当使用 s3:而不是 s3a:方案访问时,相同的堆栈跟踪(1)重新浮出水面
spark.read.parquet("s3a://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")
所以我该如何在EMR上配置Spark以使用 s3a:或运行 s3:而不会拒绝访问,这是假定的,因为它可能未使用适当的凭据链
(1)
spark.read.parquet("s3://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")
更新-1 尝试设置机密和访问密钥不起作用
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Access denied (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxx-xxxx-xxxx-xxxx-xxxxxxxx)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1658)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1322)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1072)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:745)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:719)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:701)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:669)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:651)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:515)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1369)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1338)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1327)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:488)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:460)
答案 0 :(得分:0)
此堆栈跟踪显示“ amazon EMR S3客户端”;而不是Apache ASF,因此设置和错误消息不同。
关于“假定角色”的错误消息提示您正在EC2 VM中运行(是吗?),并且“假定角色”实际上是EC2 VM部署时使用的IAM角色。在这种情况下(a)没有其他凭据被获取,并且(b)VM没有访问角色的权限。修复:确定设置以获取凭据,增加EC2 IAM角色权限或创建具有不同角色的VM