我在不平衡数据集中进行情感分析。我遇到的问题是朴素的svm分类器提供的roc-auc得分比svm +采样更好。这是朴素的svm结果(第一个悲伤的括号是roc-auc得分,第二个是G-平均得分,第三是f1_measure)
这是过采样+ svm结果:
这也是我的svm代码:
clf=SVC(kernel='linear',C=1,probability=True)
clf.fit(tf_idf_train3, polarity_train)
probs = clf.predict_proba(tf_idf_test3)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
pred = clf.predict(tf_idf_test3)
roc_auc=roc_auc_score(polarity_test,preds,average='macro')
print(classification_report(polarity_test,pred))
print(confusion_matrix(polarity_test,pred))
gmean=geometric_mean_score(polarity_test,pred,average='macro')
f1=f1_score(polarity_test, pred, average='macro')
这是我的svm + oversample代码:
clf=SVC(kernel='linear',C=1,probability=True)
X_resample, y_resampled = ros.fit_resample(tf_idf_train3, polarity_train)
clf.fit(X_resample, y_resampled)
probs = clf.predict_proba(tf_idf_test3)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
pred = clf.predict(tf_idf_test3)
roc_auc=roc_auc_score(polarity_test,preds ,average='macro')
print(classification_report(polarity_test,pred))
print(confusion_matrix(polarity_test,pred))
gmean=geometric_mean_score(polarity_test,pred,average='macro')
f1=f1_score(polarity_test, pred, average='macro')