公共服务器上有large dataset(~0.5TB,multi-part here),我想将其复制到我自己的s3存储桶中。似乎aws s3 cp
仅适用于基于S3存储桶的本地文件或文件?
如何将该文件(单个或多个部分)复制到S3?我可以使用AWS CLI还是需要其他东西?
答案 0 :(得分:0)
无法从远程位置直接将其上传到S3。但您可以将远程文件的内容流式传输到您的计算机,然后再流式传输到S3。这意味着您将下载整个0.5TB的数据,但您的计算机一次只能在内存中保留一小部分数据(它也不会持久存储到光盘中)。这是javascript中的一个简单实现:
const request = require('request')
const async = require('async')
const AWS = require('aws-sdk')
const s3 = new AWS.S3()
const Bucket = 'nyu_depth_v2'
const baseUrl = 'http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/'
const parallelLimit = 5
const parts = [
'basements.zip',
'bathrooms_part1.zip',
'bathrooms_part2.zip',
'bathrooms_part3.zip',
'bathrooms_part4.zip',
'bedrooms_part1.zip',
'bedrooms_part2.zip',
'bedrooms_part3.zip',
'bedrooms_part4.zip',
'bedrooms_part5.zip',
'bedrooms_part6.zip',
'bedrooms_part7.zip',
'bookstore_part1.zip',
'bookstore_part2.zip',
'bookstore_part3.zip',
'cafe.zip',
'classrooms.zip',
'dining_rooms_part1.zip',
'dining_rooms_part2.zip',
'furniture_stores.zip',
'home_offices.zip',
'kitchens_part1.zip',
'kitchens_part2.zip',
'kitchens_part3.zip',
'libraries.zip',
'living_rooms_part1.zip',
'living_rooms_part2.zip',
'living_rooms_part3.zip',
'living_rooms_part4.zip',
'misc_part1.zip',
'misc_part2.zip',
'office_kitchens.zip',
'offices_part1.zip',
'offices_part2.zip',
'playrooms.zip',
'reception_rooms.zip',
'studies.zip',
'study_rooms.zip'
]
async.eachLimit(parts, parallelLimit, (Key, cb) => {
s3.upload({
Key,
Bucket,
Body: request(baseUrl + Key)
}, cb)
}, (err) => {
if (err) console.error(err)
else console.log('Done')
})