Python - 如何在wordcount mapreduce作业中获取文件名

时间:2016-07-29 18:52:10

标签: python mapreduce

我的任务是使用mapreduce作业从txt文件获取wordcount。但是,当我尝试使用wordcount打印文件名时,我得到了键错误。请帮帮我。

#!/usr/bin/env python

import sys
import os
import re
# input comes from STDIN (standard input)
for line in sys.stdin:
stopwords = ['a','able','about','across','after','all','almost','also','am','among','an','and','any','are','as','at','be','because','been','but','by','can','cannot','could','dear','did','do','does','either','else','ever','every','for','from','get','got','had','has','have','he','her','hers','him','his','how','however','i','if','in','into','is','it','its','just','least','let','like','likely','may','me','might','most','must','my','neither','no','nor','not','of','off','often','on','only','or','other','our','own','rather','said','say','says','she','should','since','so','some','than','that','the','their','them','then','there','these','they','this','tis','to','too','twas','us','wants','was','we','were','what','when','where','which','while','who','whom','why','will','with','would','yet','you','your']
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
fname = os.environ['map_input_file']
words = re.findall(r"[A-Za-z]+", line)
words = line.split()
words = [word for word in words if word not in stopwords]
# increase counters
for word in words:
    # write the results to STDOUT (standard output);
    # what we output here will be the input for the
    # Reduce step, i.e. the input for reducer.py
    #
    # tab-delimited; the trivial word count is 1
    print '%s\t%s' % (word + ' ' + fname, 1)

我必须传递reducer(word& filename,1)。上面的代码我得到了关键错误。

File "/home/s/ss/ssa8455/mapper.py", line 12, in ?
fname = os.environ['map_input_file']
File "/usr/lib64/python2.4/UserDict.py", line 17, in __getitem__
def __getitem__(self, key): return self.data[key]
KeyError: 'map_input_file'

0 个答案:

没有答案