Hadoop distcp命令无法正常工作

时间:2014-01-28 14:29:03

标签: hadoop

您好我正在尝试将我的数据从具有CDH4.3的群集移动到具有CDH4.5的群集。 我正在执行以下命令。

hadoop distcp -update hftp://server1:50070/hbase/test/x hdfs://server2:8020/copy/

执行后我收到以下错误:

14/01/28 19:42:43 INFO tools.DistCp: srcPaths=[hftp://server1:50070/hbase/test/x]
14/01/28 19:42:43 INFO tools.DistCp: destPath=hdfs://server2:8020/copy
14/01/28 19:42:45 INFO tools.DistCp: sourcePathsCount=1
14/01/28 19:42:45 INFO tools.DistCp: filesToCopyCount=1
14/01/28 19:42:45 INFO tools.DistCp: bytesToCopyCount=1
14/01/28 19:42:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/28 19:42:47 INFO mapred.JobClient: Running job: job_201401101918_0008
14/01/28 19:42:48 INFO mapred.JobClient:  map 0% reduce 0%
14/01/28 19:43:05 INFO mapred.JobClient:  map 100% reduce 0%
14/01/28 19:43:07 INFO mapred.JobClient: Task Id : attempt_201401101918_0008_m_000000_0, Status : FAILED
14/01/28 19:43:08 INFO mapred.JobClient:  map 0% reduce 0%
14/01/28 19:43:19 INFO mapred.JobClient:  map 100% reduce 0%
14/01/28 19:43:22 INFO mapred.JobClient: Task Id : attempt_201401101918_0008_m_000000_1, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
        at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

14/01/28 19:43:23 INFO mapred.JobClient:  map 0% reduce 0%
14/01/28 19:43:33 INFO mapred.JobClient:  map 100% reduce 0%
14/01/28 19:43:35 INFO mapred.JobClient: Task Id : attempt_201401101918_0008_m_000000_2, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
        at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

14/01/28 19:43:36 INFO mapred.JobClient:  map 0% reduce 0%
14/01/28 19:43:46 INFO mapred.JobClient:  map 100% reduce 0%
14/01/28 19:43:50 INFO mapred.JobClient:  map 0% reduce 0%
14/01/28 19:43:53 INFO mapred.JobClient: Job complete: job_201401101918_0008
14/01/28 19:43:53 INFO mapred.JobClient: Counters: 6
14/01/28 19:43:53 INFO mapred.JobClient:   Job Counters
14/01/28 19:43:53 INFO mapred.JobClient:     Failed map tasks=1
14/01/28 19:43:53 INFO mapred.JobClient:     Launched map tasks=4
14/01/28 19:43:53 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=64095
14/01/28 19:43:53 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/01/28 19:43:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/01/28 19:43:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/01/28 19:43:53 INFO mapred.JobClient: Job Failed: NA
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1388)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

You have new mail in /var/spool/mail/root
[hdfs@sdl1039 root]$  hadoop distcp -update hftp://server1:50070/hbase/test/x hdfs://server2:8020/copy hadoop distcp -update hftp://server1:50070/hbase/test/x hdfs://server2:8020/copy
14/01/28 19:46:09 INFO tools.DistCp: srcPaths=[hftp://server1:50070/hbase/test/x, hdfs://server2:8020/copy, hadoop, distcp, hftp://server1:50070/hbase/test/x]
14/01/28 19:46:09 INFO tools.DistCp: destPath=hdfs://server2:8020/copy
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source hadoop does not exist.
Input source distcp does not exist.
        at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

请指导我哪里出错了。

2 个答案:

答案 0 :(得分:1)

我现在有了一个解决方案

hadoop distcp -update hdfs://server1:8020/hbase/test/x hdfs://server2:8020/copy/

但绝对想知道为什么http对我不起作用。

答案 1 :(得分:-1)

我认为hftp的端口号错误。 50070是namenode web ui的默认端口。

尝试:

hadoop distcp -update hftp://server1/hbase/test/x hdfs://server2:8020/copy/