我正在使用distcp将所有文件从一个Hadoop群集复制到另一个Hadoop群集。第一次尝试复制所有数据,但第二次尝试复制所有数据,但出现异常DuplicateFileException(记录将导致重复)。有关更多详细信息,请检查波纹管日志堆栈。
我尝试过 ./bin/hadoop distcp-更新hdfs:// XXXXXXXXX:8020 / * hdfs:// XXXXXXXXX:9000 / bin / hadoop distcp -p -log -i-覆盖hdfs:// XXXXXXXXX:8020 / * hdfs:// XXXXXXXXX:9000 /
ERROR tools.DistCp: Duplicate files in input path:
org.apache.hadoop.tools.CopyListing$DuplicateFileException: File hdfs://192.168.1.22:8020/original/10000 Sales Records and hdfs://192.168.1.22:8020/sample/10000 Sales Records would cause duplicates. Aborting
at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:160)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91)
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)