应用词形还原时出错

时间:2018-01-04 17:09:05

标签: python-3.x machine-learning lemmatization

为什么我收到此错误,请帮忙。 我是机器学习的新手。 这是我的代码,在这里我已经在20个新闻组数据集上应用了词形还原。 此代码旨在在应用过滤时获得具有最高计数的500个单词。

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import fetch_20newsgroups
from nltk.corpus import names
from nltk.stem import WordNetLemmatizer

def letters_only(astr):
    return astr.isalpha()


cv = CountVectorizer(stop_words="english", max_features=500)
groups = fetch_20newsgroups()
cleaned = []
all_names = set(names.words())
lemmatizer = WordNetLemmatizer()

for post in groups.data:
    cleaned.append(' '.join([lemmatizer.lemmatize(word.lower()
    for word in post.split()
    if letters_only(word) and word not in all_names)]))


transformed = cv.fit_transform(cleaned)
print(cv.get_feature_names())

错误:

Traceback (most recent call last):

  File "<ipython-input-91-7158a74bae71>", line 18, in <module>
    for word in post.split()

  File "C:\Program Files\Anaconda3\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)

  File "C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1712, in _morphy
    forms = apply_rules([form])

  File "C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1692, in apply_rules
    for form in forms

  File "C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1694, in <listcomp>
    if form.endswith(old)]

AttributeError: 'generator' object has no attribute 'endswith'

1 个答案:

答案 0 :(得分:0)

我不确定为什么,但是将for循环的一个衬套变成常规的for循环可以解决问题;

for post in groups.data:
    for word in post.split():
        if letters_only(word) and word not in all_names:
            cleaned.append(' '.join([lemmatizer.lemmatize(word.lower())]))