我有一个训练和测试集(大小相等)。我已经完成了单词袋模型,并且尝试在其上做K近邻,但不确定如何进行拟合。
单词袋模型:
from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_features=100, stop_words='english')
bow = bow_vectorizer.fit(TrainData)
print(bow_vectorizer.vocabulary_)
bowTrain = bow_vectorizer.fit_transform(TrainData)
bowTest = bow_vectorizer.fit_transform(TestData)
尝试在“语言袋”模型上进行KNN,但我不确定应该在“ knn.fit”部分中添加什么
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(bowTrain, ???? )
predict = knn.predict(bowTest[0:5000])
答案 0 :(得分:0)
from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_features=100, stop_words='english')
X_train = TrainData
#y_train = your array of labels goes here
bowVect = bow_vectorizer.fit(X_train)
您可能应该使用相同的矢量化程序,因为词汇可能会发生变化。
bowTrain = bowVect.transform(X)
bowTest = bowVect.transform(TestData)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(bowTrain, y_train )
predict = knn.predict(bowTest[0:5000])