我正在尝试使用scikit-learn LinearSVC分类器制作情绪分析器。问题是分类器将每个句子分类为正面。另一个问题是 - 为什么函数predict()会返回每个文本的分类标签列表?我认为它应该只返回一个文本/数字,这是分类标签。以下是代码中的示例。
vectorizer = TfidfVectorizer(input='content', decode_error='ignore')
vect_train_x = vectorizer.fit_transform(training_data) # this is actually a list of sentences
scaler = StandardScaler(with_mean=False) # I don't know why it should be False
X_train = scaler.fit_transform(vect_train_x) # compute mean, std and transform training data as well
vect_test_x = vectorizer.transform(test) # the sentence that needs to be classified
X_test = scaler.transform(vect_test_x)
clf = LinearSVC()
clf.fit(X_train, labels)
print vect_test_x
print clf.predict(X_test) # returning me a list of Positive => ['Positive' 'Positive' 'Positive' 'Positive' 'Positive' 'Positive']
如果你解释一下我究竟是什么不理解,我将非常感激。我试着阅读文档但没有任何例子我无法理解。我的训练数据包括10万个正面和10万个否定句子。
答案 0 :(得分:0)
遇到这个问题,我遇到了同样的问题,解决了我的问题是先将X_test转换为列表,然后转换为np.array然后需要传递给'predict'函数
new_array = []
new_array.append(Input) #Input is string if reading from a file or from input()
X_test = np.array(new_array)
print clf.predict(X_test)