我有一个正在写入S3文件夹的Apache Spark应用程序,因为Spark应用程序在写入S3时正在对数据进行分区,因此它添加了EQUAL符号,如下所示。
s3:// biops / XXX / YYY / entryDateYear = 2018 / entryDateMonth = 07
我完全理解S3不允许创建带有“ =”的bucket_name,但是Spark流创建了一个带有field_name的分区,其后是“ =”然后是value。
请告知如何访问等号的S3文件夹。
// Actual paths in S3 is --> biops/XXX/YYY/royalty_raw_json/entryDateYear=2018/
String bucket = "biops";
String without_equal = "XXX/YYY/royalty_raw_json/";
String with_equal = "XXX/YYY/royalty_raw_json/entryDateYear=2018";
String with_equal_encoding = "XXX/YYY/royalty_raw_json/entryDateYear%3D2018";
AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().
withCredentials(getCredentialsProvider(credentials))
.withRegion("us-east-1")
.build();
amazonS3.doesObjectExist(bucket, without_equal); // Works
amazonS3.doesObjectExist(bucket, with_equal); // Not works
amazonS3.doesObjectExist(bucket, with_equal_encoding); // Not works 2
更新 我设法解决了下面列出的对象的问题,以检查是否存在文件夹
ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(bucket).withPrefix(with_equal);
ObjectListing bucketListing = amazonS3.listObjects(listObjectsRequest);
if (bucketListing != null && bucketListing.getObjectSummaries() != null && bucketListing.getObjectSummaries().size() > 0)
System.out.println("Folder present with files");
else
System.out.println("Folder present with zero files or Folder not present");