我在MapReduce中运行以下Python代码:
from mrjob.job import MRJob
import collections
bigram = collections.defaultdict(float)
unigram = collections.defaultdict(float)
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
# Now we loop over lines in the system input
line = line.strip().split()
# go through each word in sentence
i = 0
for word in line:
if i > 0:
hist = word
else:
hist = ''
word = CleanWord(word) # Get the new word
# If CleanWord didn't return a string, move on
if word == None: continue
i += 1
yield word.lower(), hist.lower(), 1.0
if __name__ == '__main__':
MRWordFreqCount.run()
我收到错误:ValueError:要解压缩的值太多(预期2)但我无法弄清楚原因。有什么建议?
我正在运行的cmd行代码是:
python myjob.py Test.txt --mapper
答案 0 :(得分:2)
在MapReduce作业中,您只发出键和值对。为此,您可以应用以下类型的策略:
yield (word.lower(), hist.lower()), 1.0