在hadoop中跑步时,mrjob永远挂着

时间:2015-05-08 19:56:52

标签: python mrjob

我在doc中运行教程,word count适用于本地文件,但我尝试

     python mr.py -r hadoop 1.txt

然后它就会挂起。

当键盘中断时,日志为:

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /var/folders/zv/1hqhxh0n6m374cwzysmdn6zc0000gn/T/mr.yd006t.20150508.194506.047719
writing wrapper script to /var/folders/zv/1hqhxh0n6m374cwzysmdn6zc0000gn/T/mr.yd006t.20150508.194506.047719/setup-wrapper.sh
Using Hadoop version 2.7.0
Copying local files into hdfs:///user/yd006t/tmp/mrjob/mr.yd006t.20150508.194506.047719/files/
^CTraceback (most recent call last):
  File "mr.py", line 16, in <module>
    MRWordFrequencyCount.run()
  File "/Library/Python/2.7/site-packages/mrjob/job.py", line 461, in run
    mr_job.execute()
  File "/Library/Python/2.7/site-packages/mrjob/job.py", line 479, in execute
    super(MRJob, self).execute()
  File "/Library/Python/2.7/site-packages/mrjob/launch.py", line 151, in execute
    self.run_job()
  File "/Library/Python/2.7/site-packages/mrjob/launch.py", line 214, in run_job
    runner.run()
  File "/Library/Python/2.7/site-packages/mrjob/runner.py", line 464, in run
    self._run()
  File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 237, in _run
    self._run_job_in_hadoop()
  File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 339, in _run_job_in_hadoop
    self._process_stderr_from_streaming(master)
  File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 388, in _process_stderr_from_streaming
    for line in treat_eio_as_eof(stderr):
  File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 381, in treat_eio_as_eof
    yield iter.next()  # okay for StopIteration to bubble up
KeyboardInterrupt

这就是mr.py中的事情

from mrjob.job import MRJob


class MRWordFrequencyCount(MRJob):

    def mapper(self, _, line):
        yield "chars", len(line)
        yield "words", len(line.split())
        yield "lines", 1

    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    MRWordFrequencyCount.run()

0 个答案:

没有答案