如何使用scala和aws-java-sdk从S3存储桶中获取所有S3ObjectSummary?

时间:2019-04-25 15:28:50

标签: scala amazon-s3 aws-sdk

我有一个scala项目,我尝试实现一项需要访问Amazon S3存储桶的服务。

我想获取存储桶中所有对象的列表,但是s3Client.listObjects的结果集被分页为1000个项目。

一个人必须获取多个objectListings才能获得所有结果。

我发现了一个example Java implementation,但它依赖于可变性(在while循环中覆盖objectListing):

AmazonS3 s3Client = AmazonS3Provider.getS3Client();
ListObjectsRequest req = new ListObjectsRequest().withBucketName(realBucket).withPrefix(!preprefix.equals("") ? preprefix + "/" + prefix : prefix);
ObjectListing objectListing = s3Client.listObjects(req);
List<S3ObjectSummary> summaries = objectListing.getObjectSummaries();

while (objectListing.isTruncated()) {
    objectListing = s3Client.listNextBatchOfObjects(objectListing);
    summaries.addAll(objectListing.getObjectSummaries());
}

虽然我可以将其翻译成scala,但我想使用一种更惯用的scala方法。

如何使用Scala获取存储桶的所有页面?

1 个答案:

答案 0 :(得分:0)

我现在正在使用递归方法,并在每次迭代期间填充结果对象。一旦到达最后一页,它将返回最终的收藏。

相关部分发生在getAllSummaries方法中,我保留了其他实现细节,以便可以帮助其他人更轻松地工作。 (我的AmazonS3Config是包含我的S3凭证的基本案例类。)

import com.amazonaws.auth.{AWSStaticCredentialsProvider, BasicAWSCredentials}
import com.amazonaws.regions.Regions
import com.amazonaws.services.s3.model.{ObjectListing, S3ObjectSummary}
import com.amazonaws.services.s3.{AmazonS3, AmazonS3ClientBuilder}

import scala.collection.JavaConverters._

object Starter extends App with Configurable {

  private lazy val client: AmazonS3 = createAmazonClient(this.config.s3)

  val objects = getAllObjects()

  def getAllObjects(): Seq[S3ObjectSummary] = {
    val bucket = "YOUR_BUCKET_NAME"
    val prefix = ""

    val objectListing: ObjectListing = client.listObjects(bucket, prefix)

    getAllSummaries(objectListing)
  }

  private def getAllSummaries(list: ObjectListing,
                              res: Seq[S3ObjectSummary] = Seq.empty[S3ObjectSummary]): Seq[S3ObjectSummary] =
    list.isTruncated match {
      case false => {
        res ++ list.getObjectSummaries.asScala
      }
      case true =>
        val newList = this.client.listNextBatchOfObjects(list)
        getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala)

    }

  private def createAmazonClient(config: AmazonS3Config): AmazonS3 = {
    val region = Regions.valueOf(config.region)
    val awsCredentials = new BasicAWSCredentials(config.accessKey, config.secretKey)

    AmazonS3ClientBuilder
      .standard()
      .withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
      .withRegion(region)
      .build()
  }
}