使用MRJOB计算bigram:accur类型错误

时间:2017-07-19 07:14:35

标签: python dictionary reduce word mrjob

我是Mrjob使用map-reduce程序的新手。我需要用Mrjob计算二元组数。

这是我的代码:

import mrjob
from mrjob.job import MRJob
import re
from itertools import islice, izip
import itertools

WORD_RE = re.compile(r'[a-zA-Z]+')

class BigramCount(MRJob):
  OUTPUT_PROTOCOL = mrjob.protocol.RawProtocol

  def mapper(self, _, line):
    words = WORD_RE.findall(line)

    for i in izip(words, islice(words, 1, None)):
      bigram=str(i[0]+"-" +i[1])
      yield (bigram, 1)

  def combiner(self, bigram, counts):
    yield (bigram.encode('utf-8'), sum(counts))

  def reducer(self, bigram, counts):
    yield (bigram.encode('utf-8'), sum(counts))

if __name__ == '__main__':
  BigramCount.run()

然后发生错误:

return b'\t'.join(x for x in (key, value) if x is not None)

TypeError: sequence item 1: expected string, int found

有人能告诉我我的代码有什么问题吗?以及如何调试它?

0 个答案:

没有答案