有人可以帮助MRJob和Pymorphy2吗?我是python和hadoop的新手。我有点理解如何执行文本标记化,但是我无法理解如何使用Pymorphy2形态分解结果标记。也许我在做一些明显的错误,但是我不明白。 这是我的代码:
from mrjob.job import MRJob
import re, pymorphy2
morph = pymorphy2.MorphAnalyzer()
WORD_RE = re.compile(r"[\w']+")
class MRMorphWord(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def reducer(self, _, word):
for i in word:
p = morph.parse(word)[0]
yield p
if __name__ == '__main__':
MRMorphWord.run()
这是错误消息:
parse
word_lower = word.lower()
AttributeError: 'generator' object has no attribute 'lower'