AttributeError:' str'对象没有属性'引理'

时间:2015-12-29 18:20:43

标签: python

我正在尝试拆分一些文本,将其转换为小写并对此数据应用引理函数。我遇到了这个问题:

AttributeError: 'str' object has no attribute 'lemma'

这是我的功能:

def lower_case_lemma(message):
    words = (i for i in message.split())
    lemma= (word.lemma for word in words)
    return [i.lower() for i in lemma]

train.apply(lower_case_lemma)

火车是我的文本文件。

我尝试先转换为低位然后应用引理函数。我遇到了同样的问题。

这是追踪:

AttributeError                            Traceback (most recent call last)
<ipython-input-8-e5673a0acb1b> in <module>()
----> 1 train1.question.head().apply(lower_case)

C:\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2058             values = lib.map_infer(values, lib.Timestamp)
   2059 
-> 2060         mapped = lib.map_infer(values, f, convert=convert_dtype)
   2061         if len(mapped) and isinstance(mapped[0], Series):
   2062             from pandas.core.frame import DataFrame

pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:58464)()

<ipython-input-7-d01b8d7c0c12> in lower_case(message)
      2     words = (i for i in message.split())
      3     lemma= (word.lemma for word in words)
----> 4     return [i.lower() for i in lemma]

<ipython-input-7-d01b8d7c0c12> in <listcomp>(.0)
      2     words = (i for i in message.split())
      3     lemma= (word.lemma for word in words)
----> 4     return [i.lower() for i in lemma]

<ipython-input-7-d01b8d7c0c12> in <genexpr>(.0)
      1 def lower_case(message):
      2     words = (i for i in message.split())
----> 3     lemma= (word.lemma for word in words)
      4     return [i.lower() for i in lemma]

AttributeError: 'str' object has no attribute 'lemma'

这是一个lemmatisation的样本:

input:
[when, athletes, begin, to, exercise, their, h...
[when, two, nuclei, are, combined, into, one, ...

output:
[when, athlete, begin, to, exercise, their, he...
[when, two, nucleus, are, combined, into, one,...

请注意运动员和细胞核转换为运动员和细胞核。

1 个答案:

答案 0 :(得分:1)

不知道你在说什么,但在这里

from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
def lower_case_lemma(message):
    for word in  message.lower().split():
        yield wordnet_lemmatizer.lemmatize(word)