我需要从S3下载照片(不在同一目录中),将其压缩后再使用AWS S3 Java SDK上传到S3。该zip文件大小可以以GB为单位。目前,我正在使用AWS Lambda,它的临时存储限制为最大500 MB。因此,我不想将ZIP文件保存在磁盘上,而是想将ZIP文件(使用从S3下载的照片动态创建的ZIP文件)直接传输到S3。我需要使用AWS S3 Java SDK。
谢谢!
答案 0 :(得分:1)
基本思想是使用流操作。这样,您就不会等到在文件系统上生成ZIP之后,而是在ZIP算法生成任何数据后立即开始上传。显然,一些数据将被缓冲在内存中,仍然不需要等待整个ZIP在磁盘上生成。我们还将在两个线程中使用流组合和PipedInputStream
/ PipedOutputStream
:一个用于读取数据,另一个用于对内容进行ZIP压缩。
这是aws-java-sdk的版本:
final AmazonS3 client = AmazonS3ClientBuilder.defaultClient();
final PipedOutputStream pipedOutputStream = new PipedOutputStream();
final PipedInputStream pipedInputStream = new PipedInputStream(pipedOutputStream);
final Thread s3In = new Thread(() -> {
try (final ZipOutputStream zipOutputStream = new ZipOutputStream(pipedOutputStream)) {
S3Objects
// It's just a convenient way to list all the objects. Replace with you own logic.
.inBucket(client, "bucket")
.forEach((S3ObjectSummary objectSummary) -> {
try {
if (objectSummary.getKey().endsWith(".png")) {
System.out.println("Processing " + objectSummary.getKey());
final ZipEntry entry = new ZipEntry(
UUID.randomUUID().toString() + ".png" // I'm too lazy to extract file name from the
// objectSummary
);
zipOutputStream.putNextEntry(entry);
IOUtils.copy(
client.getObject(
objectSummary.getBucketName(),
objectSummary.getKey()
).getObjectContent(),
zipOutputStream
);
zipOutputStream.closeEntry();
}
} catch (final Exception all) {
all.printStackTrace();
}
});
} catch (final Exception all) {
all.printStackTrace();
}
});
final Thread s3Out = new Thread(() -> {
try {
client.putObject(
"another-bucket",
"previews.zip",
pipedInputStream,
new ObjectMetadata()
);
pipedInputStream.close();
} catch (final Exception all) {
all.printStackTrace();
}
});
s3In.start();
s3Out.start();
s3In.join();
s3Out.join();
但是,请注意它将显示警告:
WARNING: No content length specified for stream data. Stream contents will be buffered in memory and could result in out of memory errors.
这是因为S3需要在上传之前预先知道数据大小。事先无法知道生成的ZIP的大小。您可以尝试使用multipart uploads运气,但是代码会更加棘手。但是,想法很相似:一个线程应读取数据并以ZIP流发送内容,而另一个线程应读取ZIP条目并将其作为多部分上传。上传所有条目(部分)后,应完成分段。
以下是aws-java-sdk-2.x的示例:
final S3Client client = S3Client.create();
final PipedOutputStream pipedOutputStream = new PipedOutputStream();
final PipedInputStream pipedInputStream = new PipedInputStream(pipedOutputStream);
final Thread s3In = new Thread(() -> {
try (final ZipOutputStream zipOutputStream = new ZipOutputStream(pipedOutputStream)) {
client.listObjectsV2Paginator(
ListObjectsV2Request
.builder()
.bucket("bucket")
.build()
)
.contents()
.forEach((S3Object object) -> {
try {
if (object.key().endsWith(".png")) {
System.out.println("Processing " + object.key());
final ZipEntry entry = new ZipEntry(
UUID.randomUUID().toString() + ".png" // I'm too lazy to extract file name from the object
);
zipOutputStream.putNextEntry(entry);
client.getObject(
GetObjectRequest
.builder()
.bucket("bucket")
.key(object.key())
.build(),
ResponseTransformer.toOutputStream(zipOutputStream)
);
zipOutputStream.closeEntry();
}
} catch (final Exception all) {
all.printStackTrace();
}
});
} catch (final Exception all) {
all.printStackTrace();
}
});
final Thread s3Out = new Thread(() -> {
try {
client.putObject(
PutObjectRequest
.builder()
.bucket("another-bucket")
.key("previews.zip")
.build(),
RequestBody.fromBytes(
IOUtils.toByteArray(pipedInputStream)
)
);
} catch (final Exception all) {
all.printStackTrace();
}
});
s3In.start();
s3Out.start();
s3In.join();
s3Out.join();
它遭受了同样的困扰:在上传之前,需要在内存中准备ZIP。
如果您有兴趣,我已经准备好demo project,以便您可以使用该代码。
答案 1 :(得分:0)
问题是,适用于S3的AWS Java SDK不支持将流写入Streaming的方法。以下代码段实现了一个'S3OutputStream',它从OutputStream扩展而来,并将根据大小自动执行'putObject'或'initiateMultipartUpload'。这使您可以将此S3OutputStream传递给ZipOutputStream的构造函数,例如new ZipOutputStream(new S3OutputStream(s3Client, "my_bucket", "path"))
import java.io.ByteArrayInputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.model.AbortMultipartUploadRequest;
import com.amazonaws.services.s3.model.CannedAccessControlList;
import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest;
import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest;
import com.amazonaws.services.s3.model.InitiateMultipartUploadResult;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PartETag;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.UploadPartRequest;
import com.amazonaws.services.s3.model.UploadPartResult;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class S3OutputStream extends OutputStream {
private static final Logger LOG = LoggerFactory.getLogger(S3OutputStream.class);
/** Default chunk size is 10MB */
protected static final int BUFFER_SIZE = 10000000;
/** The bucket-name on Amazon S3 */
private final String bucket;
/** The path (key) name within the bucket */
private final String path;
/** The temporary buffer used for storing the chunks */
private final byte[] buf;
/** The position in the buffer */
private int position;
/** Amazon S3 client. TODO: support KMS */
private final AmazonS3 s3Client;
/** The unique id for this upload */
private String uploadId;
/** Collection of the etags for the parts that have been uploaded */
private final List<PartETag> etags;
/** indicates whether the stream is still open / valid */
private boolean open;
/**
* Creates a new S3 OutputStream
* @param s3Client the AmazonS3 client
* @param bucket name of the bucket
* @param path path within the bucket
*/
public S3OutputStream(AmazonS3 s3Client, String bucket, String path) {
this.s3Client = s3Client;
this.bucket = bucket;
this.path = path;
this.buf = new byte[BUFFER_SIZE];
this.position = 0;
this.etags = new ArrayList<>();
this.open = true;
}
/**
* Write an array to the S3 output stream.
*
* @param b the byte-array to append
*/
@Override
public void write(byte[] b) {
write(b,0,b.length);
}
/**
* Writes an array to the S3 Output Stream
*
* @param byteArray the array to write
* @param o the offset into the array
* @param l the number of bytes to write
*/
@Override
public void write(final byte[] byteArray, final int o, final int l) {
this.assertOpen();
int ofs = o, len = l;
int size;
while (len > (size = this.buf.length - position)) {
System.arraycopy(byteArray, ofs, this.buf, this.position, size);
this.position += size;
flushBufferAndRewind();
ofs += size;
len -= size;
}
System.arraycopy(byteArray, ofs, this.buf, this.position, len);
this.position += len;
}
/**
* Flushes the buffer by uploading a part to S3.
*/
@Override
public synchronized void flush() {
this.assertOpen();
LOG.debug("Flush was called");
}
protected void flushBufferAndRewind() {
if (uploadId == null) {
LOG.debug("Starting a multipart upload for {}/{}",this.bucket,this.path);
final InitiateMultipartUploadRequest request = new InitiateMultipartUploadRequest(this.bucket, this.path)
.withCannedACL(CannedAccessControlList.BucketOwnerFullControl);
InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(request);
this.uploadId = initResponse.getUploadId();
}
uploadPart();
this.position = 0;
}
protected void uploadPart() {
LOG.debug("Uploading part {}",this.etags.size());
UploadPartResult uploadResult = this.s3Client.uploadPart(new UploadPartRequest()
.withBucketName(this.bucket)
.withKey(this.path)
.withUploadId(this.uploadId)
.withInputStream(new ByteArrayInputStream(buf,0,this.position))
.withPartNumber(this.etags.size() + 1)
.withPartSize(this.position));
this.etags.add(uploadResult.getPartETag());
}
@Override
public void close() {
if (this.open) {
this.open = false;
if (this.uploadId != null) {
if (this.position > 0) {
uploadPart();
}
LOG.debug("Completing multipart");
this.s3Client.completeMultipartUpload(new CompleteMultipartUploadRequest(bucket, path, uploadId, etags));
}
else {
LOG.debug("Uploading object at once to {}/{}",this.bucket,this.path);
final ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(this.position);
final PutObjectRequest request = new PutObjectRequest(this.bucket, this.path, new ByteArrayInputStream(this.buf, 0, this.position), metadata)
.withCannedAcl(CannedAccessControlList.BucketOwnerFullControl);
this.s3Client.putObject(request);
}
}
}
public void cancel() {
this.open = false;
if (this.uploadId != null) {
LOG.debug("Aborting multipart upload");
this.s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(this.bucket, this.path, this.uploadId));
}
}
@Override
public void write(int b) {
this.assertOpen();
if (position >= this.buf.length) {
flushBufferAndRewind();
}
this.buf[position++] = (byte)b;
}
private void assertOpen() {
if (!this.open) {
throw new IllegalStateException("Closed");
}
}
}