为什么我收到此错误,请帮忙。 我是机器学习的新手。 这是我的代码,在这里我已经在20个新闻组数据集上应用了词形还原。 此代码旨在在应用过滤时获得具有最高计数的500个单词。
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import fetch_20newsgroups
from nltk.corpus import names
from nltk.stem import WordNetLemmatizer
def letters_only(astr):
return astr.isalpha()
cv = CountVectorizer(stop_words="english", max_features=500)
groups = fetch_20newsgroups()
cleaned = []
all_names = set(names.words())
lemmatizer = WordNetLemmatizer()
for post in groups.data:
cleaned.append(' '.join([lemmatizer.lemmatize(word.lower()
for word in post.split()
if letters_only(word) and word not in all_names)]))
transformed = cv.fit_transform(cleaned)
print(cv.get_feature_names())
错误:
Traceback (most recent call last):
File "<ipython-input-91-7158a74bae71>", line 18, in <module>
for word in post.split()
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
lemmas = wordnet._morphy(word, pos)
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1712, in _morphy
forms = apply_rules([form])
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1692, in apply_rules
for form in forms
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1694, in <listcomp>
if form.endswith(old)]
AttributeError: 'generator' object has no attribute 'endswith'
答案 0 :(得分:0)
我不确定为什么,但是将for循环的一个衬套变成常规的for循环可以解决问题;
for post in groups.data:
for word in post.split():
if letters_only(word) and word not in all_names:
cleaned.append(' '.join([lemmatizer.lemmatize(word.lower())]))