与S3的Spark流连接给出了Forbidden错误

时间:2018-03-12 23:39:28

标签: scala hadoop apache-spark amazon-s3 spark-streaming

我从本地运行Spark流应用程序以从S3存储桶读取数据。

我使用Hadoop-AWS jar设置S3身份验证参数 - https://hadoop.apache.org/docs/r3.0.0/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_with_S3

这是错误消息'禁止':

org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: #####, AWS Error Code: null, AWS Error Message: Forbidden
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - HTTP Status Code: 403
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - AWS Error Code: null
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Type: Client
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Request ID: #####
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Class Name: com.amazonaws.services.s3.model.AmazonS3Exception

从桶中读取的代码:

val sc: SparkContext = createSparkContext(scName)
val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val ssc = new StreamingContext(sc, Seconds(time))
val lines = ssc.textFileStream("s3a://foldername/subfolder/")
lines.print()

我在我的终端上设置了AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN个变量,但它仍然给了我'禁止'#39;。

我可以从终端访问S3(使用AWS配置文件),所以我不确定为什么在我浏览Spark时它不起作用。任何想法都赞赏。

1 个答案:

答案 0 :(得分:0)

为了将密钥与纯文本中的密码进行模糊处理。

您可以使用键

core-site.xml文件添加到类路径中
<property>
    <name>fs.s3a.access.key</name>
    <value>...</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>...</value>
</property>

或者,如果您不关心将密钥直接放在代码中,

sc.hadoopConfiguration.set("fs.s3a.access.key", "...")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "...")

推荐的方法是use a Java jceks credential file