运行此代码示例:
#!/usr/bin/env python
from mrjob.job import MRJob
class MRWordCount(MRJob):
def mapper(self, _,line):
for word in line.split():
yield(word,1)
def reducer(self, word, counts):
yield(word,sum(counts))
if __name__ == '__main__':
MRWordCount.run()
MR作业运行失败,并显示:
Error launching job , bad input path : File does not exist: /mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging /archmangler797180279/.staging/job_local797180279_0001/files/mr-job.py#mr-job.py Streaming Command Failed!
完整错误转储:
(my_root)archmangler@ec2:~/code/hadoop-streaming$ python mr-job.py -r hadoop hdfs:///user/archmangler/input.txt
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /usr/local/lol/hadoop/bin...
Found hadoop binary: /usr/local/lol/hadoop/bin/hadoop
Using Hadoop version 2.8.2
Looking for Hadoop streaming jar in /usr/local/lol/hadoop...
Found Hadoop streaming jar: /usr/local/lol/hadoop/share/hadoop/tools /lib/hadoop-streaming-2.8.2.jar
Creating temp directory /tmp/mr-job.archmangler.20180304.182124.965439
Copying local files to hdfs:///user/archmangler/tmp/mrjob/mr- job.archmangler.20180304.182124.965439/files/...
Running step 1 of 1...
session.id is deprecated. Instead, use dfs.metrics.session-id
Initializing JVM Metrics with processName=JobTracker, sessionId=
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
Cleaning up the staging area file:/mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging/archmangler797180279/.staging /job_local797180279_0001
Error launching job , bad input path : File does not exist: /mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging /archmangler797180279/.staging/job_local797180279_0001/files/mr-job.py#mr-job.py
Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['/usr/local/lol/hadoop/bin/hadoop', 'jar', '/usr/local/lol/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.2.jar', '-files', 'hdfs:///user/archmangler/tmp/mrjob/mr-job.archmangler.20180304.182124.965439/files/mr-job.py#mr-job.py,hdfs:///user/archmangler/tmp/mrjob/mr-job.archmangler.20180304.182124.965439/files/mrjob.zip#mrjob.zip,hdfs:///user/archmangler/tmp/mrjob/mr-job.archmangler.20180304.182124.965439/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/archmangler/input.txt', '-output', 'hdfs:///user/archmangler/tmp/mrjob/mr- job.archmangler.20180304.182124.965439/output', '-mapper', 'sh -ex setup- wrapper.sh python mr-job.py --step-num=0 --mapper', '-reducer', 'sh -ex setup-wrapper.sh python mr-job.py --step-num=0 --reducer']' returned non- zero exit status 512
但是,路径确实存在至少
/mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging/archmangler797180279/.staging/
所以看起来由于某种原因,路径的最后部分和临时文件没有被创建?
问题:
注意:我不相信这是我的hadoop / hdfs网站配置的问题,但如果有人确切知道为什么会出现这种情况,如果他们之前遇到过这个问题,我会感兴趣。
非常感谢提前! 特拉亚诺