我正在尝试将一些日志文件从hdfs推送到s3存储桶
我为此使用了distcp命令,但尝试了很长时间,请帮助我进行故障排除。
sudo -u hdfs hadoop distcp -Dfs.s3a.access.key="xxxxxxxxxx" -Dfs.s3a.secret.key="xxxxxxxxxxxxxx" hdfs://prod1/data/exchange/inventory_snapshot/20160610 s3a://test-inventory-snapshot/test/
18/11/27 15:01:41 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://prod1/data/exchange/inventory_snapshot/20160610], targetPath=s3a://test-inventory-snapshot/test, targetPathExists=true, preserveRawXattrs=false}
18/11/27 15:01:41 INFO client.RMProxy: Connecting to ResourceManager at xxxx.xxxx.com/xx.xx.xx.x:8032
18/11/27 15:01:47 INFO client.RMProxy: Connecting to ResourceManager at xxxx.xxxx.com/xx.xx.xx.x:8032
18/11/27 15:01:48 INFO ipc.Client: Retrying connect to server: xxxx.xxxx.com/xx.xx.xx.x:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
18/11/27 15:01:49 INFO ipc.Client: Retrying connect to server: xxxx.xxxx.com/xx.xx.xx.x:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
18/11/27 15:01:50 INFO ipc.Client: Retrying connect to server: xxxx.xxxx.com/xx.xx.xx.x. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
18/11/27 15:01:51 INFO ipc.Client: Retrying connect to server: xxxx.xxxx.com/xx.xx.xx.x:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
18/11/27 15:01:52 INFO ipc.Client: Retrying connect to server: xxxx.xxxx.com/xx.xx.xx.x:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
当我尝试执行ls命令时,它起作用了,我通过aws凭证传递密码的方式出了问题,
sudo -u hdfs hadoop fs -ls hdfs://prod1/data/exchange/inventory_snapshot/20160610
Found 1 items
drwxr-xr-x - user hdfs 0 2016-06-10 12:30 hdfs://prod1/data/exchange/inventory_snapshot/20160610/.metadata
hadoop version
Hadoop 2.7.1.2.3.2.0-2950
Subversion git@github.com:hortonworks/hadoop.git -r 5cc60e0003e33aa98205f18bccaeaf36cb193c1c
Compiled by jenkins on 2015-09-30T18:08Z
Compiled with protoc 2.5.0
From source with checksum 69a3bf8c667267c2c252a54fbbf23d
This command was run using /usr/hdp/2.3.2.0-2950/hadoop/lib/hadoop-common-2.7.1.2.3.2.0-2950.jar
答案 0 :(得分:2)
通过xxxx.xxxx.com/xx.xx.xx.x:8032连接到ResourceManager
您似乎无法从尝试运行distcp作业的主机访问YARN ResourceManager。之所以使用“ hadoop fs -ls”,是因为它不涉及YARN。
这种情况可能有多种原因。检查ResourceManager是否确实在节点xxxx.xxxx.com/xx.xx.xx.x:8032上运行,请检查是否可以访问该主机/端口。尝试运行其他MapReduce作业(例如hadoop-examples中的Pi)。