我是Hadoop的新手,我正在运行Map Reduce流程来计算不同商店的收入。 映射器和减速器程序完美运行。我仔细检查了文件和目标。
当我运行MapReduce命令时:
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce1/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar \
-mapper mapper.py \
-reducer reducer.py \
-input /home/anwarvic \
-output /joboutput
它提供以下输出:
17/04/30 05:48:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/30 05:48:14 INFO Configuration.deprecation: mapred.job.tracker is` deprecated. Instead, use mapreduce.jobtracker.address
packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob7598928362555913238.jar tmpDir=null
17/04/30 05:48:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/30 05:48:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/30 05:48:21 INFO mapred.FileInputFormat: Total input paths to process : 5
17/04/30 05:48:21 INFO net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
17/04/30 05:48:24 INFO mapreduce.JobSubmitter: number of splits:6
17/04/30 05:48:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1493523215757_0002
17/04/30 05:48:27 INFO impl.YarnClientImpl: Submitted application application_1493523215757_0002
17/04/30 05:48:28 INFO mapreduce.Job: The url to track the job: http://anwar-computer:8088/proxy/application_1493523215757_0002/
17/04/30 05:48:28 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
17/04/30 05:48:28 INFO streaming.StreamJob: Running job: job_1493523215757_0002
17/04/30 05:48:28 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/30 05:48:29 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:08 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:10 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:41 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:42 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:43 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:45 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:50:07 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:50:08 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:50:37 INFO streaming.StreamJob: map 100% reduce 100%
17/04/30 05:50:41 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/30 05:50:41 ERROR streaming.StreamJob: Job not successful. Error: Task failed task_1493523215757_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
17/04/30 05:50:41 INFO streaming.StreamJob: killJob...
17/04/30 05:50:41 INFO impl.YarnClientImpl: Killed application application_1493523215757_0002
Streaming Command Failed!
输出基本上表示作业不成功,尽管Map和Reduce进程是100%
完成的根据此answer和this中的状态,我将shebang标题添加到 mapper.py 和 reduce.py 文件中:< / p>
#!/usr/bin/env python
顺便说一句,这answer对我不起作用!
我已经遇到了这个问题大约20个小时..所以任何帮助都会非常感激
答案 0 :(得分:0)
我建议采取以下步骤:
http://anwar-computer:8088/proxy/application_1493523215757_0002/
Job failed as tasks failed. failedMaps:1 failedReduces:0
)。您可以看到异常跟踪。logs
链接进行操作。 分析日志,最有可能找到根本原因。
可能的根本原因:
Mapper
中的预期不同。我怀疑第1点可能是因为该进程试图多次运行映射器并失败。
17/04/30 05:48:29 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:08 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:10 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:41 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:42 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:43 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:45 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:50:07 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:50:08 INFO streaming.StreamJob: map 0% reduce 0%
此外,您可以在Mapper
中添加更多日志以获取更多详细信息。
或者,您也可以启用logger(将--loglevel DEBUG
参数添加到hadoop命令)。 e.g。
hadoop \
--loglevel DEBUG \
jar /usr/local/hadoop/share/hadoop/mapreduce1/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar \
-mapper mapper.py \
-reducer reducer.py \
-input /home/anwarvic \
-output /joboutput
参考:https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/CommandsManual.html