应用错误收集

我是Mrjob使用map-reduce程序的新手。我需要用Mrjob计算二元组数。

这是我的代码：

import mrjob
from mrjob.job import MRJob
import re
from itertools import islice, izip
import itertools

WORD_RE = re.compile(r'[a-zA-Z]+')

class BigramCount(MRJob):
  OUTPUT_PROTOCOL = mrjob.protocol.RawProtocol

  def mapper(self, _, line):
    words = WORD_RE.findall(line)

    for i in izip(words, islice(words, 1, None)):
      bigram=str(i[0]+"-" +i[1])
      yield (bigram, 1)

  def combiner(self, bigram, counts):
    yield (bigram.encode('utf-8'), sum(counts))

  def reducer(self, bigram, counts):
    yield (bigram.encode('utf-8'), sum(counts))

if __name__ == '__main__':
  BigramCount.run()

然后发生错误：

return b'\t'.join(x for x in (key, value) if x is not None)

TypeError: sequence item 1: expected string, int found

有人能告诉我我的代码有什么问题吗？以及如何调试它？

使用MRJOB计算bigram：accur类型错误

0 个答案: