我使用Wiki中的默认配置以伪分布式模式运行Apache Hadoop 3.1.0。
我创建了一个简单的python程序,用于计算下面发布的dblp.xml文件中的article标签
from mrjob.job import MRJob
import sys
class MRArticleCount(MRJob):
def mapper(self, _, line):
yield "articles", line.count('</article>')
def reducer(self, key, counts):
yield key, sum(counts)
if __name__ == '__main__':
MRArticleCount.run()
并使用命令
运行它python articleCounter.py -r hadoop hdfs:///user/hadoop/dblp/dblp.xml
返回
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in $PATH...
Found hadoop binary: /home/hadoop/hadoop/bin/hadoop
Using Hadoop version 3.1.0
Looking for Hadoop streaming jar in /home/hadoop/hadoop...
Found Hadoop streaming jar: /home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar
Creating temp directory /tmp/articleCounter.hadoop.20180416.013824.692915
Copying local files to hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/...
Running step 1 of 1...
loaded properties from hadoop-metrics2.properties
Scheduled Metric snapshot period at 10 second(s).
JobTracker metrics system started
JobTracker metrics system already initialized!
Cleaning up the staging area file:/tmp/hadoop/mapred/staging/hadoop890329391/.staging/job_local890329391_0001
Error launching job , bad input path : File does not exist: /tmp/hadoop/mapred/staging/hadoop890329391/.staging/job_local890329391_0001/files/articleCounter.py#articleCounter.py
Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['/home/hadoop/hadoop/bin/hadoop', 'jar', '/home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar', '-files', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/articleCounter.py#articleCounter.py,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/mrjob.zip#mrjob.zip,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/hadoop/dblp/dblp.xml', '-output', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/output', '-mapper', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --mapper', '-reducer', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --reducer']' returned non-zero exit status 512
用详细信息运行它会让我觉得这个怪物:
Looking for configs in /home/hadoop/.mrjob.conf
Looking for configs in /etc/mrjob.conf
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Active configuration:
{'bootstrap_mrjob': None,
'bootstrap_spark': None,
'check_input_paths': True,
'cleanup': ['ALL'],
'cleanup_on_failure': ['NONE'],
'cmdenv': {},
'hadoop_bin': None,
'hadoop_extra_args': [],
'hadoop_log_dirs': [],
'hadoop_streaming_jar': None,
'hadoop_tmp_dir': 'tmp/mrjob',
'interpreter': None,
'jobconf': {},
'label': None,
'libjars': [],
'local_tmp_dir': '/tmp',
'owner': 'hadoop',
'py_files': [],
'python_bin': None,
'setup': [],
'sh_bin': ['sh', '-ex'],
'spark_args': [],
'spark_master': 'yarn',
'spark_submit_bin': None,
'steps_interpreter': None,
'steps_python_bin': None,
'task_python_bin': None,
'upload_archives': [],
'upload_dirs': [],
'upload_files': []}
Looking for hadoop binary in $PATH...
Found hadoop binary: /home/hadoop/hadoop/bin/hadoop
> /home/hadoop/hadoop/bin/hadoop fs -ls hdfs:///user/hadoop/dblp/dblp.xml
STDOUT: -rw-r--r-- 1 hadoop supergroup 2257949018 2018-04-15 04:23 hdfs:///user/hadoop/dblp/dblp.xml
> /home/hadoop/hadoop/bin/hadoop version
Using Hadoop version 3.1.0
> /usr/bin/python /home/hadoop/articleCounter.py --steps
Looking for Hadoop streaming jar in /home/hadoop/hadoop...
Found Hadoop streaming jar: /home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar
Creating temp directory /tmp/articleCounter.hadoop.20180416.014112.103990
archiving /home/hadoop/.local/lib/python2.7/site-packages/mrjob -> /tmp/articleCounter.hadoop.20180416.014112.103990/mrjob.zip as mrjob/
Writing wrapper script to /tmp/articleCounter.hadoop.20180416.014112.103990/setup-wrapper.sh
WRAPPER: # store $PWD
WRAPPER: __mrjob_PWD=$PWD
WRAPPER:
WRAPPER: # obtain exclusive file lock
WRAPPER: exec 9>/tmp/wrapper.lock.articleCounter.hadoop.20180416.014112.103990
WRAPPER: python -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
WRAPPER:
WRAPPER: # setup commands
WRAPPER: {
WRAPPER: export PYTHONPATH=$__mrjob_PWD/mrjob.zip:$PYTHONPATH
WRAPPER: } 0</dev/null 1>&2
WRAPPER:
WRAPPER: # release exclusive file lock
WRAPPER: exec 9>&-
WRAPPER:
WRAPPER: # run task from the original working directory
WRAPPER: cd $__mrjob_PWD
WRAPPER: "$@"
> /home/hadoop/hadoop/bin/hadoop fs -mkdir -p hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/
Copying local files to hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/...
/tmp/articleCounter.hadoop.20180416.014112.103990/mrjob.zip -> hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip
> /home/hadoop/hadoop/bin/hadoop fs -put /tmp/articleCounter.hadoop.20180416.014112.103990/mrjob.zip hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip
/home/hadoop/articleCounter.py -> hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py
> /home/hadoop/hadoop/bin/hadoop fs -put /home/hadoop/articleCounter.py hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py
/tmp/articleCounter.hadoop.20180416.014112.103990/setup-wrapper.sh -> hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh
> /home/hadoop/hadoop/bin/hadoop fs -put /tmp/articleCounter.hadoop.20180416.014112.103990/setup-wrapper.sh hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh
Running step 1 of 1...
> /home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar -files 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py#articleCounter.py,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip#mrjob.zip,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh#setup-wrapper.sh' -input hdfs:///user/hadoop/dblp/dblp.xml -output hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/output -mapper 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --mapper' -reducer 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --reducer'
with environment: [('HOME', '/home/hadoop'), ('LANG', 'C'), ('LOGNAME', 'hadoop'), ('LS_COLORS', 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:'), ('MAIL', '/var/mail/hadoop'), ('OLDPWD', '/home/hadoop/hadoop/share/hadoop/tools/lib'), ('PATH', '/home/hadoop/hadoop/bin:/home/hadoop/hadoop/sbin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games'), ('PWD', '/home/hadoop'), ('SHELL', '/bin/bash'), ('SHLVL', '1'), ('SSH_CLIENT', '192.168.1.188 40594 22'), ('SSH_CONNECTION', '192.168.1.188 40594 192.168.1.150 22'), ('SSH_TTY', '/dev/pts/2'), ('TERM', 'xterm-256color'), ('USER', 'hadoop'), ('XDG_RUNTIME_DIR', '/run/user/1000'), ('XDG_SESSION_ID', '18543'), ('_', '/usr/bin/python')]
Invoking Hadoop via PTY
loaded properties from hadoop-metrics2.properties
Scheduled Metric snapshot period at 10 second(s).
JobTracker metrics system started
JobTracker metrics system already initialized!
Cleaning up the staging area file:/tmp/hadoop/mapred/staging/hadoop108797154/.staging/job_local108797154_0001
Error launching job , bad input path : File does not exist: /tmp/hadoop/mapred/staging/hadoop108797154/.staging/job_local108797154_0001/files/articleCounter.py#articleCounter.py
Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['/home/hadoop/hadoop/bin/hadoop', 'jar', '/home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar', '-files', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py#articleCounter.py,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip#mrjob.zip,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/hadoop/dblp/dblp.xml', '-output', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/output', '-mapper', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --mapper', '-reducer', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --reducer']' returned non-zero exit status 512
程序本身与测试数据集内联运行完美,但是使用hadoop运行程序失败。我认为问题在于启动作业时输入路径不好但我不知道如何解决这个问题。任何帮助将不胜感激,我很乐意提供任何有助于解决问题的配置文件或日志!
谢谢!