使用ParquetWriter写入S3时无法从链中任何提供程序加载AWS凭证

时间:2020-09-02 23:38:24

标签: java apache-spark hadoop amazon-s3 parquet

我正在使用以下Java代码尝试将我的对象写入S3。

JavaRDD<String> filePaths = objJavaRDD.map( rdd -> {
            ArrayList<MyEntity> entityResult = rdd.getObjectResult();
                
                String filePath = "s3a://myBucket/test.parquet";
                Path dataFile = new Path(filePath);

                Configuration config = new Configuration();
                config.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain");
                
                try (ParquetWriter<MyEntity> writer = AvroParquetWriter.<MyEntity>builder(dataFile)
                        .withSchema(ReflectData.AllowNull.get().getSchema(MyEntity.class))
                        .withDataModel(ReflectData.get())
                        .withConf(config)
                        .withCompressionCodec(SNAPPY)
                        .withWriteMode(OVERWRITE)
                        .build()) {
                    for (MyEntity d : entityResult) {
                        writer.write(d);
                    }
                } catch (Exception e) {
                    System.err.println("Failed to write to the file. \n" + e.getMessage());
                }
            
                return filePath;
        });

我确实尝试按照AWS的建议进行导出。

export AWS_ACCESS_KEY_ID=myAccesskey
export AWS_SECRET_ACCESS_KEY=myAccessScrete

我得到的例外是:

Unable to load AWS credentials from any provider in the chain

以下是我的依赖项:

compile ('com.amazonaws:aws-java-sdk:1.7.4') {
    exclude group: 'org.apache.httpcomponents', module: 'httpclient'
    exclude group: 'org.apache.httpcomponents', module: 'httpcore'
}
compile 'org.apache.httpcomponents:httpclient:4.5'
compile 'org.apache.httpcomponents:httpcore:4.4.3'
compile 'org.apache.parquet:parquet-avro:1.7.0'
compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '2.7.1'
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '2.7.1'

注意

  • 我们的公司仍在spark 1.6.2上,其中还包含hadoop 2.2.0项内容。不确定会造成麻烦。
  • 此外,我们的EMR确实是旧的4.8.2,这使得我们不能使用太新的依赖关系。例如,c om.amazonaws:aws-java-sdk-s3:1.10.75

0 个答案:

没有答案