如何尽快复制文件?

时间:2014-04-07 04:43:06

标签: linux bash unix ubuntu scp

我正在machineA上运行我的shell脚本,该脚本将文件从machineBmachineC复制到machineA

如果machineB中没有该文件,那么它应该位于machineC中。所以我会首先尝试从machineB复制,如果machineB中没有,那么我会转到machineC复制相同的文件。

machineBmachineC中,此文件夹中会有一个如此YYYYMMDD的文件夹 -

/data/pe_t1_snapshot

因此,无论日期是上述文件夹中此格式YYYYMMDD的最新日期 - 我将选择该文件夹作为我需要开始复制文件的完整路径 -

所以假设这是20140317内的最新日期文件夹/data/pe_t1_snapshot,那么这将是我的完整路径 -

/data/pe_t1_snapshot/20140317

我需要开始复制machineBmachineC中的文件。我需要从400machineA复制machineB中的machineC个文件,每个文件大小为1.5 GB

目前我的下面的shell脚本工作正常,因为我正在使用scp,但不知何故需要〜2 hours复制machineA中的400文件,这对我来说太长了猜测。 :(

下面是我的shell脚本 -

#!/bin/bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9) # this will have more file numbers around 200
SECONDARY_PARTITION=(1 2 4 6 8) # this will have more file numbers around 200

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

echo $dir1
echo $dir2

if [ "$dir1" = "$dir2" ]
then
    # delete all the files first
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.
    done

    # delete all the files first
    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.
    done
fi

我正在复制PRIMARY_PARTITION文件夹中的PRIMARY个文件和SECONDARY_PARTITIONSECONDARY文件夹中的machineA个文件。

有没有办法在machineA中更快地移动文件。我可以一次复制10个文件,也可以一次复制5个文件,以加快此过程或任何其他方法吗?

注意:machineA正在SSD

上运行

更新: -

我试过的并行Shell脚本,shell脚本的顶部与上面显示的相同。

if [ "$dir1" = "$dir2" ] && [ "$length1" -gt 0 ] && [ "$length2" -gt 0 ]
then
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.) &
          WAITPID="$WAITPID $!"        
    done

    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.) &
          WAITPID="$WAITPID $!"        
    done
     wait $WAITPID
     echo "All files done copying."
fi

我使用并行shell脚本的错误 -

channel 24: open failed: administratively prohibited: open failed
channel 25: open failed: administratively prohibited: open failed
channel 26: open failed: administratively prohibited: open failed
channel 28: open failed: administratively prohibited: open failed
channel 30: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 32: open failed: administratively prohibited: open failed
channel 36: open failed: administratively prohibited: open failed
channel 37: open failed: administratively prohibited: open failed
channel 38: open failed: administratively prohibited: open failed
channel 40: open failed: administratively prohibited: open failed
channel 46: open failed: administratively prohibited: open failed
channel 47: open failed: administratively prohibited: open failed
channel 49: open failed: administratively prohibited: open failed
channel 52: open failed: administratively prohibited: open failed
channel 54: open failed: administratively prohibited: open failed
channel 55: open failed: administratively prohibited: open failed
channel 56: open failed: administratively prohibited: open failed
channel 57: open failed: administratively prohibited: open failed
channel 59: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 61: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 64: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 68: open failed: administratively prohibited: open failed
channel 72: open failed: administratively prohibited: open failed
channel 74: open failed: administratively prohibited: open failed
channel 76: open failed: administratively prohibited: open failed
channel 78: open failed: administratively prohibited: open failed

7 个答案:

答案 0 :(得分:19)

你可以尝试这个命令

rsync

来自

man rsync

您将看到:rsync远程更新协议允许rsync使用此软件包随附的技术报告中描述的高效校验和搜索算法,仅通过网络连接传输两组文件之间的差异。

答案 1 :(得分:7)

您可以尝试使用HPN-SSH(高性能SSH / SCP) - http://www.psc.edu/index.php/hpn-sshhttp://hpnssh.sourceforge.net/

HPN-SSH项目是OpenSSH的一组补丁(scp是其中的一部分),可以更好地调整各种tcp和内部缓冲区。还有“无”密码(“无密码交换”)禁用加密,这也可能对您有帮助(如果您不使用公共网络发送数据)。

压缩和加密都会消耗CPU时间;和10 Gbit以太网有时可能更快地传输未压缩文件,然后等待CPU压缩和加密它。

您可以分析您的设置:

  • 使用iperfnetperf衡量计算机之间的网络带宽。与实际网络(网卡功能,交换机)进行比较。如果设置良好,您应该获得超过80-90%的声明速度。
  • 使用iperfnetperf的速度计算数据量和使用网络传输大量数据所需的时间。与实际转移时间相比,是否存在巨大差异?
    • 如果您的CPU速度很快,数据是可压缩的,网络速度很慢,压缩会对您有所帮助。
  • 查看topvmstatiostat
    • 是否有100%加载的CPU核心(运行top并按1查看核心)?
    • in中是否有太多中断(vmstat 1)?上下文切换(cs)?
    • 怎么样?
    • iostat 1中的文件阅读速度是多少?你的硬盘驱动器是否足够快以读取数据;在接收器上写数据?
  • 您可以尝试使用perf topperf record -a进行全系统分析。 Linux中的scp或网络堆栈有很多计算机吗?如果您可以安装dtracektap,请尝试同时制作off-cpu profiling

答案 2 :(得分:6)

您有1.5 GB * 400 = 600 GB的数据。与答案无关我建议如果您需要传输此数据量,机器设置看起来不正确。您可能需要首先在机器A上生成此数据。

在2小时内传输600 GB数据,即〜85 MB / s传输速率,这意味着您可能已达到磁盘驱动器或(几乎)网络的传输限制。我相信你无法通过任何其他命令更快地转移。

如果机器彼此靠近,我认为最快的复制方法是从机器B和C中物理移除存储,将它们放入机器A,然后在本地复制它们而不通过网络传输。这个时间是移动存储的时间,加上磁盘传输时间。但是,我担心副本的速度会比85 MB / s快得多。

我认为最快的网络传输命令是netcat,因为它没有与加密相关的开销。此外,如果文件不是媒体文件,则必须使用压缩比压缩速度超过85 MB / s的压缩器进行压缩。我知道lzop和lz4被授予比这个速度更快的速度。因此,我传输单个目录的命令行将是(BSD netcat语法):

机器A:

$ nc -l 2000 | lzop -d | tar x

机器B或C(可以在ssh的帮助下从机器A执行):

$ tar c directory | lzop | nc machineA 2000

如果传输已压缩的媒体文件,请删除压缩器。

组织目录结构的命令在速度方面无关紧要,所以我没有在这里写它们,但你可以重用自己的代码。

这是我能想到的最快的方法,但是,我再也不相信这个命令会比你已经拥有的命令快得多。

答案 3 :(得分:1)

您肯定想尝试rclone。这东西快疯了:

sudo rclone sync / usr / home / fred / temp -P -L-传输64

已传输:17.929G / 17.929 GBytes,100%,165.692 MBytes / s,ETA 0s 错误:75(重试可能有帮助) 支票:691078/691078,100% 转让:345539/345539,100% 耗用时间:1m50.8s

这是LITEONIT LCS-256(256GB)SSD的本地副本。

答案 4 :(得分:0)

rsync可选择压缩其数据。这通常会使转移变得更快。

你没有提到SCP,但SCP -C也会压缩。

请注意,压缩可能会使传输速度变慢或变慢,具体取决于CPU和网络链接的速度。

较慢的链接和更快的CPU使压缩成为一个好主意;更快的链接和更慢的CPU使压缩成为一个坏主意。

与任何优化一样,在您自己的环境中测量结果。

另外我认为ftp是另一种选择,因为我对大文件(> 10M)FTP的传输速度测试比SCP甚至rsync工作得更快(它取决于文件格式和压缩率)。 / p>

答案 5 :(得分:0)

rsync是一个很好的答案,但如果您关心安全性,那么您应该考虑使用:

rdist

有关rsync和rdist之间差异的一些细节可以在这里找到: rdist vs rsync 有关如何使用ssh进行设置的博客可以在这里找到:non root remote updating

最后你可以使用臭名昭着的tar管道tar模式,撒上ssh。

tar zcvf - /wwwdata | ssh root@dumpserver.nixcraft.in "cat > /backup/wwwdata.tar.gz"

这里讨论了这个例子:tar copy over secure network

答案 6 :(得分:0)

遥控器不支持ssh多路复用。

要使消息静音:

mux_client_request_session: session request failed: Session open refused by peer

更改您的~/.ssh/config文件:

Host destination.hostname.com
  ControlMaster no

Host *
  ControlMaster auto
  ControlPersist yes
  ControlPath ~/.ssh/socket-%r@%h:%p

更多详细信息和注释,请参见here