从一个群集到另一个群集的HDFS数据传输不适用于distcp

时间:2018-10-09 10:32:28

标签: hadoop hdfs cloudera

我需要将HDFS数据从一个群集传输到另一个群集。我看到“ distcp”命令对这种情况很有帮助。但事实并非如此。两个群集Namenode都与其他datanode专用互连。因此,我有两个代理计算机可以与namenode公开连接。说,我在haproxy中使namenode的8070端口在20000以下运行。现在,我可以ping通两个集群的namenode。所以,我去了distcp选项。在那里,mapreduce作业开始执行以进行数据传输,但尚未完成。

combination_1_Improved <- function(n,r){

    denom <- num <- 1
    i <- (n - r + 1)

    for (denom in 1:r) {
        num <- num * i;
        num <- num / denom;
        i <- i + 1
    }

    num
}

print(combination_1_Improved(n,r-1), digits = 22)
[1] 138535357316356

为您提供信息,我已记录了几份工作记录

[hdfs@ip-20-0-42-252 ~]$ hadoop distcp  hdfs://YY.YY.YY.YY:20000/user/ce_prasith/filter.txt  hdfs://xx.xx.xx.xx:20000/user/gl_qauser
18/10/09 10:12:15 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs:/user/ce_prasith/filter.txt], targetPath=hdfs://xx.xx.xx.xx:20000/user/gl_qauser, targetPathExists=true, filtersFile='null'}
18/10/09 10:12:16 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
18/10/09 10:12:16 INFO tools.SimpleCopyListing: Build file listing completed.
18/10/09 10:12:16 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
18/10/09 10:12:16 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
18/10/09 10:12:16 INFO tools.DistCp: Number of paths in the copy list: 1
18/10/09 10:12:16 INFO tools.DistCp: Number of paths in the copy list: 1
18/10/09 10:12:16 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm97
18/10/09 10:12:16 INFO mapreduce.JobSubmitter: number of splits:1
18/10/09 10:12:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539063069030_0003
18/10/09 10:12:16 INFO impl.YarnClientImpl: Submitted application application_1539063069030_0003
18/10/09 10:12:17 INFO mapreduce.Job: The url to track the job: http://ip-20-0-21-94.ec2.internal:8088/proxy/application_1539063069030_0003/
18/10/09 10:12:17 INFO tools.DistCp: DistCp job-id: job_1539063069030_0003
18/10/09 10:12:17 INFO mapreduce.Job: Running job: job_1539063069030_0003
18/10/09 10:12:22 INFO mapreduce.Job: Job job_1539063069030_0003 running in uber mode : true
18/10/09 10:12:22 INFO mapreduce.Job:  map 0% reduce 0%
18/10/09 10:13:22 INFO mapreduce.Job:  map 100% reduce 0%

我被困在这里。有人有什么想法要克服吗?

1 个答案:

答案 0 :(得分:0)

源群集中的所有节点应看到目标群集中的所有节点。