我从本地运行Spark流应用程序以从S3存储桶读取数据。
我使用Hadoop-AWS jar设置S3身份验证参数 - https://hadoop.apache.org/docs/r3.0.0/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_with_S3
这是错误消息'禁止':
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: #####, AWS Error Code: null, AWS Error Message: Forbidden
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - HTTP Status Code: 403
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - AWS Error Code: null
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Type: Client
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Request ID: #####
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Class Name: com.amazonaws.services.s3.model.AmazonS3Exception
从桶中读取的代码:
val sc: SparkContext = createSparkContext(scName)
val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val ssc = new StreamingContext(sc, Seconds(time))
val lines = ssc.textFileStream("s3a://foldername/subfolder/")
lines.print()
我在我的终端上设置了AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN
个变量,但它仍然给了我'禁止'#39;。
我可以从终端访问S3(使用AWS配置文件),所以我不确定为什么在我浏览Spark时它不起作用。任何想法都赞赏。
答案 0 :(得分:0)
为了将密钥与纯文本中的密码进行模糊处理。
您可以使用键
将core-site.xml
文件添加到类路径中
<property>
<name>fs.s3a.access.key</name>
<value>...</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>...</value>
</property>
或者,如果您不关心将密钥直接放在代码中,
sc.hadoopConfiguration.set("fs.s3a.access.key", "...")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "...")