Question

我在服务器上有一大堆文件，我想将这些文件上传到S3。这些文件以.data扩展名存储，但实际上它们只是一堆jpeg，png，zip或pdf。

我已经编写了一个简短的脚本，可以找到mime类型并将它们上传到S3上，但是它很有效。有没有办法使用gnu parallel进行以下运行？

#!/bin/bash

for n in $(find -name "*.data") 
do 
        data=".data" 
        extension=`file $n | cut -d ' ' -f2 | awk '{print tolower($0)}'` 
        mimetype=`file --mime-type $n | cut -d ' ' -f2`
        fullpath=`readlink -f $n`

        changed="${fullpath/.data/.$extension}"

        filePathWithExtensionChanged=${changed#*internal_data}

        s3upload="s3cmd put -m $mimetype --acl-public $fullpath s3://tff-xenforo-data"$filePathWithExtensionChanged     

        response=`$s3upload`
        echo $response 

done

此外，我确信此代码可以大大改进:)反馈提示将不胜感激。

Answer 1

你明白擅长编写shell，并且非常接近解决方案：

s3upload_single() {
    n=$1
    data=".data" 
    extension=`file $n | cut -d ' ' -f2 | awk '{print tolower($0)}'` 
    mimetype=`file --mime-type $n | cut -d ' ' -f2`
    fullpath=`readlink -f $n`

    changed="${fullpath/.data/.$extension}"

    filePathWithExtensionChanged=${changed#*internal_data}

    s3upload="s3cmd put -m $mimetype --acl-public $fullpath s3://tff-xenforo-data"$filePathWithExtensionChanged     

    response=`$s3upload`
    echo $response 
}
export -f s3upload_single
find -name "*.data" | parallel s3upload_single

Answer 2

你可以使用s3cmd-modified，它允许你并行地与多个工人进行/获取/同步

$ git clone https://github.com/pcorliss/s3cmd-modification.git $ cd s3cmd-modification $ python setup.py install $ s3cmd --parallel --workers=4 sync /source/path s3://target/path

Answer 3

使用aws cli。它支持并行上传文件，上传和下载时速度非常快。

http://docs.aws.amazon.com/cli/latest/reference/s3/

Answer 4

试试s3-cli：命令行实用程序前端到node-s3-client。灵感来自s3cmd并试图成为替代品。

从https://erikzaadi.com/2015/04/27/s3cmd-is-dead-long-live-s3-cli/释义：

这是s3cmd的现场替换，用节点编写（yaay！），它与现有的s3cmd配置完美配合，其中（除了其他很棒的东西之外），并行上传到S3，节省了LOADS的时间。
-        system "s3cmd sync --delete-removed . s3://yourbucket.com/"
+        system "s3-cli sync --delete-removed . s3://yourbucket.com/"

使用s3cmd并行将文件上载到s3

4 个答案: