我在下面的教程中有以下代码:http://blog.christianperone.com/2011/09/machine-learning-text-feature-extraction-tf-idf-part-i/:
train_set = ("The sky is blue.", "The sun is bright.")
test_set = ("The sun in the sky is bright.",
"We can see the shining sun, the bright sun.")
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
print(vectorizer)
#CountVectorizer(analyzer__min_n=1, analyzer__stop_words=set(['all']))
vectorizer.fit_transform(train_set)
print(vectorizer.vocabulary)
smatrix = vectorizer.transform(test_set)
print(smatrix.todense())
这给了我一个在不同句子中使用的单词矩阵。这种方法很好,我想摆脱一些停顿词。
因此我尝试:
CountVectorizer(analyzer__min_n=1, analyzer__stop_words=set(['is', 'the']))
然而,这给了我以下错误:
Traceback (most recent call last):
File "C:/Users/Marc/PycharmProjects/clustering/testing.py", line 16, in <module>
CountVectorizer(analyzer__min_n=1, analyzer__stop_words=set(['is', 'the']))
TypeError: __init__() got an unexpected keyword argument 'analyzer__min_n'
任何想法出错的地方