从.tar.gz文件处理将TarArchiveEntry流传输到S3存储桶

时间:2018-10-01 23:36:16

标签: amazon-s3 aws-lambda java-stream tar gunzip

我正在使用aws Lamda解压缩并遍历tar.gz文件,然后将它们上传回s3 deflated并保留原始目录结构。

我遇到了通过PutObjectRequest将TarArchiveEntry流到S3存储桶的问题。当第一个条目成功流传输时,尝试在TarArchiveInputStream上获取getNextTarEntry()时,由于基础GunzipCompress充气机为null,因此抛出了空指针,该指针在s3.putObject(new PutObjectRequest(...)之前具有适当的值呼叫。

我无法找到有关将gz输入流充气器属性部分发送给s3后如何/为什么将其设置为null的文档。 编辑进一步的调查显示,在完成指定内容长度的上传后,AWS调用似乎正在关闭输入流...尚无法找到防止这种行为的方法。

下面基本上是我的代码的样子。在此先感谢您的帮助,评论和建议。

public String handleRequest(S3Event s3Event, Context context) {

    try {
        S3Event.S3EventNotificationRecord s3EventRecord = s3Event.getRecords().get(0);
        String s3Bucket = s3EventRecord.getS3().getBucket().getName();

        // Object key may have spaces or unicode non-ASCII characters.
        String srcKey = s3EventRecord.getS3().getObject().getKey();

        System.out.println("Received valid request from bucket: " + bucketName + " with srckey: " + srcKeyInput);

        String bucketFolder = srcKeyInput.substring(0, srcKeyInput.lastIndexOf('/') + 1);
        System.out.println("File parent directory: " + bucketFolder);

        final AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();

        TarArchiveInputStream tarInput = new TarArchiveInputStream(new GzipCompressorInputStream(getObjectContent(s3Client, bucketName, srcKeyInput)));

        TarArchiveEntry currentEntry = tarInput.getNextTarEntry();

        while (currentEntry != null) {
            String fileName = currentEntry.getName();
            System.out.println("For path = " + fileName);

            // checking if looking at a file (vs a directory)
            if (currentEntry.isFile()) {

                System.out.println("Copying " + fileName + " to " + bucketFolder + fileName + " in bucket " + bucketName);
                ObjectMetadata metadata = new ObjectMetadata();
                metadata.setContentLength(currentEntry.getSize());

                s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, tarInput, metadata)); // contents are properly and successfully sent to s3
                System.out.println("Done!");
            }

            currentEntry = tarInput.getNextTarEntry(); // NPE here due underlying gz inflator is null;
        }
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        IOUtils.closeQuietly(tarInput);
    }
}

1 个答案:

答案 0 :(得分:1)

是的,AWS关闭提供给InputStream的{​​{1}},但我不知道一种指示AWS不这样做的方法。

但是,您可以用CloseShieldInputStream中的Commons IO来包裹PutObjectRequest,就像这样:

TarArchiveInputStream

当AWS关闭提供的InputStream shieldedInput = new CloseShieldInputStream(tarInput); s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, shieldedInput, metadata)); 时,基础CloseShieldInputStream将保持打开状态。


PS。我不知道TarArchiveInputStream的作用,但是看起来很奇怪。出于此答案的目的,我忽略了它。