我目前负责将一个站点从当前服务器转移到EC2,该项目的一部分完成并且很好,另一部分是我正在努力的部分,该站点目前有近400K图像,所有已排序在主userimg文件夹中的不同文件夹中,客户端希望所有这些图像都存储在S3上 - 我的主要问题是如何将近400,000个图像从服务器传输到S3 - 我一直在使用http://s3tools.org/s3cmd是辉煌的,但如果我要用s3cmd传输userimg文件夹,它将花费将近3天的时间,如果连接中断或类似的问题,我将在s3上有一些图像,有些没有,没有办法继续过程...
任何人都可以建议一个解决方案,有没有人遇到过这样的问题呢?
答案 0 :(得分:3)
我建议你写一个简单的Java实用程序(或者让别人写一下):
我是为我们的客户做的。它不到200行java代码,非常可靠。 下面是执行多部分上传的部分。读取文件结构的部分很简单。
/**
* Uploads file to Amazon S3. Creates the specified bucket if it does not exist.
* The upload is done in chunks of CHUNK_SIZE size (multi-part upload).
* Attempts to handle upload exceptions gracefully up to MAX_RETRY times per single chunk.
*
* @param accessKey - Amazon account access key
* @param secretKey - Amazon account secret key
* @param directoryName - directory path where the file resides
* @param keyName - the name of the file to upload
* @param bucketName - the name of the bucket to upload to
* @throws Exception - in case that something goes wrong
*/
public void uploadFileToS3(String accessKey
,String secretKey
,String directoryName
,String keyName // that is the file name that will be created after upload completed
,String bucketName ) throws Exception {
// Create a credentials object and service to access S3 account
AWSCredentials myCredentials =
new BasicAWSCredentials(accessKey, secretKey);
String filePath = directoryName
+ System.getProperty("file.separator")
+ keyName;
log.info("uploadFileToS3 is about to upload file [" + filePath + "]");
AmazonS3 s3Client = new AmazonS3Client(myCredentials);
// Create a list of UploadPartResponse objects. You get one of these
// for each part upload.
List<PartETag> partETags = new ArrayList<PartETag>();
// make sure that the bucket exists
createBucketIfNotExists(bucketName, accessKey, secretKey);
// delete the file from bucket if it already exists there
s3Client.deleteObject(bucketName, keyName);
// Initialize.
InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, keyName);
InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest);
File file = new File(filePath);
long contentLength = file.length();
long partSize = CHUNK_SIZE; // Set part size to 5 MB.
int numOfParts = 1;
if (contentLength > CHUNK_SIZE) {
if (contentLength % CHUNK_SIZE != 0) {
numOfParts = (int)((contentLength/partSize)+1.0);
}
else {
numOfParts = (int)((contentLength/partSize));
}
}
try {
// Step 2: Upload parts.
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++) {
// Last part can be less than 5 MB. Adjust part size.
partSize = Math.min(partSize, (contentLength - filePosition));
log.info("Start uploading part[" + i + "] of [" + numOfParts + "]");
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucketName).withKey(keyName)
.withUploadId(initResponse.getUploadId()).withPartNumber(i)
.withFileOffset(filePosition)
.withFile(file)
.withPartSize(partSize);
// repeat the upload until it succeeds or reaches the retry limit
boolean anotherPass;
int retryCount = 0;
do {
anotherPass = false; // assume everything is ok
try {
log.info("Uploading part[" + i + "]");
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
log.info("Finished uploading part[" + i + "] of [" + numOfParts + "]");
} catch (Exception e) {
log.error("Failed uploading part[" + i + "] due to exception. Will retry... Exception: ", e);
anotherPass = true; // repeat
retryCount++;
}
}
while (anotherPass && retryCount < CloudUtilsService.MAX_RETRY);
filePosition += partSize;
log.info("filePosition=[" + filePosition + "]");
}
log.info("Finished uploading file");
// Complete.
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(
bucketName,
keyName,
initResponse.getUploadId(),
partETags);
s3Client.completeMultipartUpload(compRequest);
log.info("multipart upload completed.upload id=[" + initResponse.getUploadId() + "]");
} catch (Exception e) {
s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, keyName, initResponse.getUploadId()));
log.error("Failed to upload due to Exception:", e);
throw e;
}
}
/**
* Creates new bucket with the names specified if it does not exist.
*
* @param bucketName - the name of the bucket to retrieve or create
* @param accessKey - Amazon account access key
* @param secretKey - Amazon account secret key
* @throws S3ServiceException - if something goes wrong
*/
public void createBucketIfNotExists(String bucketName, String accessKey, String secretKey) throws S3ServiceException {
try {
// Create a credentials object and service to access S3 account
org.jets3t.service.security.AWSCredentials myCredentials =
new org.jets3t.service.security.AWSCredentials(accessKey, secretKey);
S3Service service = new RestS3Service(myCredentials);
// Create a new bucket named after a normalized directory path,
// and include my Access Key ID to ensure the bucket name is unique
S3Bucket zeBucket = service.getOrCreateBucket(bucketName);
log.info("the bucket [" + zeBucket.getName() + "] was created (if it was not existing yet...)");
} catch (S3ServiceException e) {
log.error("Failed to get or create bucket[" + bucketName + "] due to exception:", e);
throw e;
}
}
答案 1 :(得分:1)
听起来像是Rsync的工作。我从未将它与S3结合使用,但S3Sync似乎就是你所需要的。
答案 2 :(得分:1)
如果您不想实际上传所有文件(或者实际上是管理它),那么可以使用AWS Import/Export,这基本上只需要向Amazon发送硬盘。
答案 3 :(得分:1)
您可以使用superflexiblefilesychronizer。它是商业产品,但Linux版本是免费的。
它可以比较和同步文件夹,并且可以并行传输多个文件。它很快。接口可能不是最简单的,但这主要是因为它有一百万个配置选项。
注意:我与该产品没有任何关联,但我已经使用过它。
答案 4 :(得分:1)