Question

我们需要尽快将15TB数据从一台服务器转移到另一台服务器。我们目前正在使用rsync，但当我们的网络能够150Mb/s（使用900+Mb/s测试）时，我们只能获得大约iperf的速度。我已经完成了对磁盘，网络等的测试，并认为rsync只是一次传输一个文件，导致速度减慢。

我找到了一个脚本来为目录树中的每个文件夹运行不同的rsync（允许你限制为x号），但是我无法使它工作，它仍然只能一次运行一个rsync。

我找到了script here（复制如下）。

我们的目录树是这样的：

/main
   - /files
      - /1
         - 343
            - 123.wav
            - 76.wav
         - 772
            - 122.wav
         - 55
            - 555.wav
            - 324.wav
            - 1209.wav
         - 43
            - 999.wav
            - 111.wav
            - 222.wav
      - /2
         - 346
            - 9993.wav
         - 4242
            - 827.wav
      - /3
         - 2545
            - 76.wav
            - 199.wav
            - 183.wav
         - 23
            - 33.wav
            - 876.wav
         - 4256
            - 998.wav
            - 1665.wav
            - 332.wav
            - 112.wav
            - 5584.wav

所以我想要发生的事情是为/ main / files中的每个目录创建一个rsync，一次最多为5个目录。因此，在这种情况下，将为/main/files/1，/main/files/2和/main/files/3运行3个rsyncs。

我尝试了这样，但它只为/main/files/2文件夹一次运行1个rsync：

#!/bin/bash

# Define source, target, maxdepth and cd to source
source="/main/files"
target="/main/filesTest"
depth=1
cd "${source}"

# Set the maximum number of concurrent rsync threads
maxthreads=5
# How long to wait before checking the number of rsync threads again
sleeptime=5

# Find all folders in the source directory within the maxdepth level
find . -maxdepth ${depth} -type d | while read dir
do
    # Make sure to ignore the parent folder
    if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]
    then
        # Strip leading dot slash
        subfolder=$(echo "${dir}" | sed 's@^\./@@g')
        if [ ! -d "${target}/${subfolder}" ]
        then
            # Create destination folder and set ownership and permissions to match source
            mkdir -p "${target}/${subfolder}"
            chown --reference="${source}/${subfolder}" "${target}/${subfolder}"
            chmod --reference="${source}/${subfolder}" "${target}/${subfolder}"
        fi
        # Make sure the number of rsync threads running is below the threshold
        while [ `ps -ef | grep -c [r]sync` -gt ${maxthreads} ]
        do
            echo "Sleeping ${sleeptime} seconds"
            sleep ${sleeptime}
        done
        # Run rsync in background for the current subfolder and move one to the next one
        nohup rsync -a "${source}/${subfolder}/" "${target}/${subfolder}/" </dev/null >/dev/null 2>&1 &
    fi
done

# Find all files above the maxdepth level and rsync them as well
find . -maxdepth ${depth} -type f -print0 | rsync -a --files-from=- --from0 ./ "${target}/"

Answer 1

这似乎更简单：

ls /srv/mail | parallel -v -j8 rsync -raz --progress {} myserver.com:/srv/mail/{}

Answer 2

rsync以尽可能快的速度通过网络传输文件。例如，尝试使用它来复制目标上根本不存在的一个大文件。该速度是rsync可以传输数据的最大速度。将其与scp的速度进行比较（例如）。当目标文件存在时，rsync在原始传输时甚至更慢，因为双方必须双向聊天，关于文件的哪些部分被更改，但通过识别不具有的数据来收回自己的费用。需要转移。

并行运行rsync的更简单方法是使用parallel。下面的命令将最多并行运行5 rsync个，每个复制一个目录。请注意，瓶颈可能不是您的网络，但CPU和磁盘的速度以及并行运行只会使它们变慢，而不是更快。

run_rsync() {
    # e.g. copies /main/files/blah to /main/filesTest/blah
    rsync -av "$1" "/main/filesTest/${1#/main/files/}"
}
export -f run_rsync
parallel -j5 run_rsync ::: /main/files/*

Answer 3

网上列出了许多替代工具和方法。例如：

NCSA Blog描述了使用xargs和find并行化rsync，而无需为大多数* nix系统安装任何新软件。
parsync为并行rsync提供了一个功能丰富的Perl包装器。

Answer 4

您可以使用支持一次运行多个进程的xargs。对于你的情况，它将是：

ls -1 /main/files | xargs -I {} -P 5 -n 1 rsync -avh /main/files/{} /main/filesTest/

Answer 5

您是否尝试过使用rclone.org？

使用rclone，您可以做类似的事情

rclone copy "${source}/${subfolder}/" "${target}/${subfolder}/" --progress --multi-thread-streams=N

其中--multi-thread-streams=N代表您希望产生的线程数。

Answer 6

我开发了一个名为：parallel_sync

的python包

https://pythonhosted.org/parallel_sync/pages/examples.html

以下是如何使用它的示例代码：

from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds)

默认为并行度为10;你可以增加它：

from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds, parallelism=20)

但请注意，ssh通常默认将MaxSessions设置为10，因此要将其增加到10以上，您必须修改ssh设置。

Answer 7

我发现最简单的方法是在shell中使用后台作业：

sub $0x18, %rsp

当心它不会限制工作量！如果您受网络限制，那么这并不是真正的问题，但是如果您正在等待旋转的锈蚀，那么这将损坏磁盘。

您可以添加

for d in /main/files/*; do
    rsync -a "$d" remote:/main/files/ &
done

在循环中获取原始形式的作业控制。

Answer 8

我发现的最短版本是使用--cat的{{1}}选项，如下所示。此版本避免使用xargs，而仅依靠parallel的功能：

parallel

对cat files.txt | \ parallel -n 500 --lb --pipe --cat rsync --files-from={} user@remote:/dir /dir -avPi #### Arg explainer # -n 500 :: split input into chunks of 500 entries # # --cat :: create a tmp file referenced by {} containing the 500 # entry content for each process # # user@remote:/dir :: the root relative to which entries in files.txt are considered # # /dir :: local root relative to which files are copied中的内容进行采样：

files.txt

请注意，这不使用/dir/file-1 /dir/subdir/file-2 ....进行工作计数，这对我的工作没有帮助。取而代之的是，我使用-j 50来记录每个作业的记录数，在给定记录总数的情况下，以合理的数量进行计算。

使用同时/并发文件传输加速rsync？

8 个答案: