我使用以下代码段来列出存储桶中的对象。
objectListing = client.listObjects(bucketname);
do{
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
System.out.printf(" - %s (size: %d)\n", objectSummary.getKey(), objectSummary.getSize());
}
objectListing=s3Client.listNextBatchOfObjects(objectListing);
}while (objectListing.isTruncated());
我无法获得最后一批对象。我对此进行了一些研究,批次保存在list.But我无法使用list来保存所有对象,因为有数百万个对象和这有时会导致堆内存问题。我怎么能得到所有的对象。谢谢!!!
新:
我正在运行:
BasicAWSCredentials credentials = new BasicAWSCredentials("foo", "bar");
client = AmazonS3ClientBuilder
.standard()
.withCredentials(new AWSStaticCredentialsProvider(credentials))
.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("http://localhost:" + port, null))
.withPathStyleAccessEnabled(true)
.withChunkedEncodingDisabled(true)
.build();
ObjectListing listing = client.listObjects( "bucketname");
System.out.println("Listing size "+listing.getObjectSummaries().size());
System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey());
System.out.println("At 999 index "+ listing.getObjectSummaries().get(999).getKey());
while (listing.isTruncated()) {
System.out.println("-----------------------------------------------");
listing = client.listNextBatchOfObjects(listing);
System.out.println("Listing size "+listing.getObjectSummaries().size());
System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey());
System.out.println("At 999 index "+ listing.getObjectSummaries().get(1000).getKey());
}
我得到以下结果:
Listing size 1000
At 0 index folder1/a.gz
At 999 index folder1/b.gz
---------------------------------------------------------------
Listing size 1001
At 0 index folder1/b.gz
At 1000 index folder1/d.gz
---------------------------------------------------------------
Listing size 1001
At 0 index folder1/d.gz
At 1000 index folder1/e.gz
答案 0 :(得分:1)
简单明了
ObjectListing listing = s3.listObjects( bucketName, prefix );
List<S3ObjectSummary> summaries = listing.getObjectSummaries();
while (listing.isTruncated()) {
listing = s3.listNextBatchOfObjects (listing);
summaries.addAll (listing.getObjectSummaries());
}
或
ObjectListing listing = s3.listObjects( bucketName, prefix );
doSomeProcessing(listing);
while (listing.isTruncated()) {
listing = s3.listNextBatchOfObjects (listing);
doSomeProcessing(listing);
}
更新:
关于重复元素的下面评论,我在代码下面运行
是的,我正在获取对象,但1000和1001对象正在重复 所以2001年和2002年的对象正在重复等等。我怎么能避免 这是通过第二种方法@raevilman。谢谢
public static void main(String[] args) {
int i=0;
System.out.println("start");
ObjectListing listing = s3Client.listObjects( "emr-logs");
System.out.println("Listing size "+listing.getObjectSummaries().size());
System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey());
System.out.println("At 999 index "+ listing.getObjectSummaries().get(999).getKey());
while (listing.isTruncated()) {
if(i>3)break;
System.out.println("========================================================================");
listing = s3Client.listNextBatchOfObjects(listing);
System.out.println("Listing size "+listing.getObjectSummaries().size());
System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey());
System.out.println("At 999 index "+ listing.getObjectSummaries().get(999).getKey());
i++;
}
System.out.println("end");
}
我得到了以下结果,没有重复元素
start
Listing size 1000
At 0 index j-10HD9DMBVVTJL/containers/application_1507189355052_0001/container_1507189355052_0001_01_000001/stderr.gz
At 999 index j-156WGS0LMKA2I/node/i-00085367e194fc02a/daemons/instance-state/instance-state.log-2017-11-16-05-15.gz
========================================================================
Listing size 1000
At 0 index j-156WGS0LMKA2I/node/i-00085367e194fc02a/daemons/instance-state/instance-state.log-2017-11-16-05-30.gz
At 999 index j-182UIXOOU8GZ6/node/i-061ffd1d1ae11da74/provision-node/0d1707a0-71dd-4dd5-a1dc-ab226ee2d150/stdout.gz
========================================================================
Listing size 1000
At 0 index j-182UIXOOU8GZ6/node/i-061ffd1d1ae11da74/provision-node/apps-phase/stderr.gz
At 999 index j-1BW9J554DDY15/containers/application_1521803257216_0002/container_1521803257216_0002_01_000002/stderr.gz
========================================================================
Listing size 1000
At 0 index j-1BW9J554DDY15/containers/application_1521803257216_0002/container_1521803257216_0002_01_000002/stdout.gz
At 999 index j-1EKRPTSEXCTB5/node/i-0576a3c452d00384b/applications/hadoop/steps/s-2B5LZ2PC741FD/controller.gz
========================================================================
Listing size 1000
At 0 index j-1EKRPTSEXCTB5/node/i-0576a3c452d00384b/applications/hadoop/steps/s-2B5LZ2PC741FD/stderr.gz
At 999 index j-1G6AYY5EMTR94/node/i-02363f6ac11c89135/daemons/instance-state/instance-state.log-2017-10-29-14-15.gz
end
Process finished with exit code 0