我正在尝试拆分一些文本,将其转换为小写并对此数据应用引理函数。我遇到了这个问题:
AttributeError: 'str' object has no attribute 'lemma'
这是我的功能:
def lower_case_lemma(message):
words = (i for i in message.split())
lemma= (word.lemma for word in words)
return [i.lower() for i in lemma]
train.apply(lower_case_lemma)
火车是我的文本文件。
我尝试先转换为低位然后应用引理函数。我遇到了同样的问题。
这是追踪:
AttributeError Traceback (most recent call last)
<ipython-input-8-e5673a0acb1b> in <module>()
----> 1 train1.question.head().apply(lower_case)
C:\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2058 values = lib.map_infer(values, lib.Timestamp)
2059
-> 2060 mapped = lib.map_infer(values, f, convert=convert_dtype)
2061 if len(mapped) and isinstance(mapped[0], Series):
2062 from pandas.core.frame import DataFrame
pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:58464)()
<ipython-input-7-d01b8d7c0c12> in lower_case(message)
2 words = (i for i in message.split())
3 lemma= (word.lemma for word in words)
----> 4 return [i.lower() for i in lemma]
<ipython-input-7-d01b8d7c0c12> in <listcomp>(.0)
2 words = (i for i in message.split())
3 lemma= (word.lemma for word in words)
----> 4 return [i.lower() for i in lemma]
<ipython-input-7-d01b8d7c0c12> in <genexpr>(.0)
1 def lower_case(message):
2 words = (i for i in message.split())
----> 3 lemma= (word.lemma for word in words)
4 return [i.lower() for i in lemma]
AttributeError: 'str' object has no attribute 'lemma'
这是一个lemmatisation的样本:
input:
[when, athletes, begin, to, exercise, their, h...
[when, two, nuclei, are, combined, into, one, ...
output:
[when, athlete, begin, to, exercise, their, he...
[when, two, nucleus, are, combined, into, one,...
请注意运动员和细胞核转换为运动员和细胞核。
答案 0 :(得分:1)
不知道你在说什么,但在这里
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
def lower_case_lemma(message):
for word in message.lower().split():
yield wordnet_lemmatizer.lemmatize(word)