使用AWS s3的分段上传API时出现内存不足问题

时间:2019-06-24 15:43:11

标签: scala amazon-web-services amazon-s3 aws-sdk awss3transfermanager

我正在尝试通过aws SDK使用aws分段上传,spark和文件大小约为14GB,但出现内存不足错误。该行的错误-val bytes: Array[Byte] = IOUtils.toByteArray(is)

我试图将驱动程序内存和执行程序内存提高到100 G,并尝试了其他一些优化火花的方法。

下面是我正在尝试的代码:-

val tm = TransferManagerBuilder.standard.withS3Client(s3Client).build
      val fs = FileSystem.get(new Configuration())
      val filePath = new Path(hdfsFilePath)
      val is:InputStream = fs.open(filePath)
      val om = new ObjectMetadata()
      val bytes: Array[Byte] = IOUtils.toByteArray(is)
      om.setContentLength(bytes.length)
      val byteArrayInputStream: ByteArrayInputStream = new ByteArrayInputStream(bytes)
      val request = new PutObjectRequest(bucketName, keyName, byteArrayInputStream, om).withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams(kmsKey)).withCannedAcl(CannedAccessControlList.BucketOwnerFullControl)
      val upload = tm.upload(request)

这是我得到的例外:-

java.lang.OutOfMemoryError
                at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
                at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
                at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
                at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
                at com.amazonaws.util.IOUtils.toByteArray(IOUtils.java:45)

1 个答案:

答案 0 :(得分:0)

PutObjectRequest accepts File

public PutObjectRequest(String bucketName, String key, File file)

类似以下的方法应该可以工作(虽然我没有检查):

val result = TransferManagerBuilder.standard.withS3Client(s3Client)
  .build
  .upload(
    new PutObjectRequest(
      bucketName,
      keyName,
      new File(new Path(hdfsFilePath))
    )
    .withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams(kmsKey))
    .withCannedAcl(CannedAccessControlList.BucketOwnerFullControl)
  )