我无法在EC2上新安装的CDH4系统上从S3运行distcp到HDFS。我也不能-l一个S3目录。
ubuntu@ip-10-145-227-232:~$ hadoop distcp s3://access_key:secret_key@bucket/logs hdfs://ip-10-145-227-232.ec2.internal:8020/tmp
13/05/20 19:07:45 INFO tools.DistCp: srcPaths=[s3://access_key:secret_key@bucket/logs]
13/05/20 19:07:45 INFO tools.DistCp: destPath=hdfs://ip-10-145-227-232.ec2.internal:8020/tmp
13/05/20 19:07:48 WARN httpclient.RestS3Service: Response '/%2Flogs' - Unexpected response code 404, expected 200
13/05/20 19:07:48 WARN httpclient.RestS3Service: Response '/%2Flogs' - Received error response with XML message
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source s3://access_key:secret_key@bucket/logs does not exist.
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
同时我可以通过类似的命令在HBase ERM群集上列出和复制。
hadoop@ip-10-165-7-106:~$ hadoop distcp s3://access_key:secret_key@bucket/logs/ hdfs://10.165.7.106:9000/test/
13/05/20 19:01:50 INFO tools.DistCp: srcPaths=[s3://access_key:secret_key@bucket/logs]
13/05/20 19:01:50 INFO tools.DistCp: destPath=hdfs://10.165.7.106:9000/test
13/05/20 19:04:47 INFO tools.DistCp: sourcePathsCount=11149
13/05/20 19:04:47 INFO tools.DistCp: filesToCopyCount=7816
13/05/20 19:04:47 INFO tools.DistCp: bytesToCopyCount=443.9m
13/05/20 19:04:47 INFO mapred.JobClient: Default number of map tasks: 1
13/05/20 19:04:47 INFO mapred.JobClient: Default number of reduce tasks: 0
13/05/20 19:04:47 INFO security.ShellBasedUnixGroupsMapping: add hadoop to shell userGroupsCache
13/05/20 19:04:47 INFO mapred.JobClient: Setting group to hadoop
13/05/20 19:04:48 INFO mapred.JobClient: Running job: job_201305201846_0001
13/05/20 19:04:49 INFO mapred.JobClient: map 0% reduce 0%
13/05/20 19:05:10 INFO mapred.JobClient: map 1% reduce 0%
13/05/20 19:05:22 INFO mapred.JobClient: map 2% reduce 0%
13/05/20 19:05:31 INFO mapred.JobClient: map 3% reduce 0%
13/05/20 19:05:40 INFO mapred.JobClient: map 4% reduce 0%
请帮帮我!非常感谢你!
更新:在CDH群集上用“s3n”替换“s3”,我可以列出文件,但仍然无法远程执行。