我是关于mrjob和hadoop的新手,在我构建我的hadoop集群之后,我尝试使用mrjob将工作提交给hadoop, 但不幸的是,它失败了,错误“返回非零退出状态256”。更多细节如下:
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
python test.py -r hadoop --python-bin=/root/.pyenv/versions/2.7.9/bin/python ./pg20417.txt
```的xml HADOOP:工作没有成功!
HADOOP:流命令失败!
作业失败,返回码为256:['/diskb/dxb/code/hadoop-2.7.1/bin/hadoop', 'jar', '/diskb/dxb/code/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar', '-files', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/test.py#test.py,hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/setup-wrapper.sh#setup-wrapper.sh', '-archives', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/mrjob.tar.gz#mrjob.tar.gz', '-input', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/pg20417.txt', '-output', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/output', '-mapper', 'sh -ex setup-wrapper.sh /root/.pyenv/versions/2.7.9/bin/python test.py --step-num=0 --mapper', '-combiner', 'sh -ex setup-wrapper.sh /root/.pyenv/versions/2.7.9/bin/python test.py --step-num=0 --combiner', '-reducer', 'sh -ex setup-wrapper.sh /root/.pyenv/versions/2.7.9/bin/python test.py --step-num=0 --reducer']
扫描日志以查找可能的失败原因
追踪(最近一次呼叫最后一次):
文件“test.py”,第25行,
MRWordFreqCount.run()
文件“/root/.pyenv/versions/2.7.9/lib/python2.7/site-packages/mrjob/job.py”,第461行,在运行中
mr_job.execute()
文件“/root/.pyenv/versions/2.7.9/lib/python2.7/site-packages/mrjob/job.py”,第479行,执行
super(MRJob,self).execute()
文件“/root/.pyenv/versions/2.7.9/lib/python2.7/site-packages/mrjob/launch.py”,第151行,执行
self.run_job()
文件“/root/.pyenv/versions/2.7.9/lib/python2.7/site-packages/mrjob/launch.py”,第214行,在run_job中
runner.run()
文件“/root/.pyenv/versions/2.7.9/lib/python2.7/site-packages/mrjob/runner.py”,第464行,在运行中
self._run()
文件“/root/.pyenv/versions/2.7.9/lib/python2.7/site-packages/mrjob/hadoop.py”,第237行,在_run中
self._run_job_in_hadoop()
文件“/root/.pyenv/versions/2.7.9/lib/python2.7/site-packages/mrjob/hadoop.py”,第372行,在_run_job_in_hadoop
引发CalledProcessError(returncode,step_args)
subprocess.CalledProcessError: Command '['/diskb/dxb/code/hadoop-2.7.1/bin/hadoop', 'jar', '/diskb/dxb/code/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar', '-files', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/test.py#test.py,hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/setup-wrapper.sh#setup-wrapper.sh', '-archives', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/mrjob.tar.gz#mrjob.tar.gz', '-input', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/files/pg20417.txt', '-output', 'hdfs:///user/root/tmp/mrjob/test.root.20150723.011910.649661/output', '-mapper', 'sh -ex setup-wrapper.sh /root/.pyenv/versions/2.7.9/bin/python test.py --step-num=0 --mapper', '-combiner', 'sh -ex setup-wrapper.sh /root/.pyenv/versions/2.7.9/bin/python test.py --step-num=0 --combiner', '-reducer', 'sh -ex setup-wrapper.sh /root/.pyenv/versions/2.7.9/bin/python test.py --step-num=0 --reducer']' returned non-zero exit status 256
hadoop2.7.1
python2.7.9