使用mrjob进行hadoop流:“启动作业时出错,输入路径错误:文件不存在:”

时间:2018-03-05 02:48:55

标签: python-2.7 hadoop hdfs

运行此代码示例:

#!/usr/bin/env python
from mrjob.job import  MRJob

 class MRWordCount(MRJob): 

        def mapper(self, _,line):
                for word in line.split():
                        yield(word,1)

        def reducer(self, word, counts):
                yield(word,sum(counts))

 if __name__ == '__main__':
        MRWordCount.run()

MR作业运行失败,并显示:

        Error launching job , bad input path : File does not exist:    /mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging   /archmangler797180279/.staging/job_local797180279_0001/files/mr-job.py#mr-job.py     Streaming Command Failed!
  • 即使至少存在路径的最后一个组件的完整目录路径确实存在。

完整错误转储:

 (my_root)archmangler@ec2:~/code/hadoop-streaming$ python mr-job.py -r     hadoop hdfs:///user/archmangler/input.txt 
 No configs found; falling back on auto-configuration
 No configs specified for hadoop runner
 Looking for hadoop binary in /usr/local/lol/hadoop/bin...
 Found hadoop binary: /usr/local/lol/hadoop/bin/hadoop
 Using Hadoop version 2.8.2
 Looking for Hadoop streaming jar in /usr/local/lol/hadoop...
 Found Hadoop streaming jar: /usr/local/lol/hadoop/share/hadoop/tools /lib/hadoop-streaming-2.8.2.jar
Creating temp directory /tmp/mr-job.archmangler.20180304.182124.965439
Copying local files to hdfs:///user/archmangler/tmp/mrjob/mr- job.archmangler.20180304.182124.965439/files/...
 Running step 1 of 1...
  session.id is deprecated. Instead, use dfs.metrics.session-id
  Initializing JVM Metrics with processName=JobTracker, sessionId=
  Cannot initialize JVM Metrics with processName=JobTracker, sessionId=   - already initialized
   Cleaning up the staging area file:/mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging/archmangler797180279/.staging  /job_local797180279_0001
   Error launching job , bad input path : File does not exist:    /mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging   /archmangler797180279/.staging/job_local797180279_0001/files/mr-job.py#mr-job.py
 Streaming Command Failed!
 Attempting to fetch counters from logs...
 Can't fetch history log; missing job ID
 No counters found
 Scanning logs for probable cause of failure...
 Can't fetch history log; missing job ID
 Can't fetch task logs; missing application ID
 Step 1 of 1 failed: Command '['/usr/local/lol/hadoop/bin/hadoop', 'jar', '/usr/local/lol/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.2.jar', '-files', 'hdfs:///user/archmangler/tmp/mrjob/mr-job.archmangler.20180304.182124.965439/files/mr-job.py#mr-job.py,hdfs:///user/archmangler/tmp/mrjob/mr-job.archmangler.20180304.182124.965439/files/mrjob.zip#mrjob.zip,hdfs:///user/archmangler/tmp/mrjob/mr-job.archmangler.20180304.182124.965439/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/archmangler/input.txt', '-output', 'hdfs:///user/archmangler/tmp/mrjob/mr-  job.archmangler.20180304.182124.965439/output', '-mapper', 'sh -ex setup- wrapper.sh python mr-job.py --step-num=0 --mapper', '-reducer', 'sh -ex   setup-wrapper.sh python mr-job.py --step-num=0 --reducer']' returned non- zero exit status 512

但是,路径确实存在至少

 /mnt/hdfs01/hdfs-tmp-dir/hadoop-archmangler/mapred/staging/archmangler797180279/.staging/

所以看起来由于某种原因,路径的最后部分和临时文件没有被创建?

问题:

  • 这是mrjob库的问题吗?
  • 我如何进一步解决这个问题?

注意:我不相信这是我的hadoop / hdfs网站配置的问题,但如果有人确切知道为什么会出现这种情况,如果他们之前遇到过这个问题,我会感兴趣。

非常感谢提前! 特拉亚诺

0 个答案:

没有答案