为什么天真svm中的roc-auc分数大于svm + oversample?

时间:2019-11-23 07:14:48

标签: svm roc imbalanced-data

我在不平衡数据集中进行情感分析。我遇到的问题是朴素的svm分类器提供的roc-auc得分比svm +采样更好。这是朴素的svm结果(第一个悲伤的括号是roc-auc得分,第二个是G-平均得分,第三是f1_measure)

navive svm results

这是过采样+ svm结果:

enter image description here

这也是我的svm代码:

    clf=SVC(kernel='linear',C=1,probability=True)
    clf.fit(tf_idf_train3, polarity_train)
    probs = clf.predict_proba(tf_idf_test3)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
    pred = clf.predict(tf_idf_test3)
    roc_auc=roc_auc_score(polarity_test,preds,average='macro')
    print(classification_report(polarity_test,pred))
    print(confusion_matrix(polarity_test,pred))
    gmean=geometric_mean_score(polarity_test,pred,average='macro')
    f1=f1_score(polarity_test, pred, average='macro')

这是我的svm + oversample代码:

    clf=SVC(kernel='linear',C=1,probability=True)
    X_resample, y_resampled = ros.fit_resample(tf_idf_train3, polarity_train)
    clf.fit(X_resample, y_resampled)
    probs = clf.predict_proba(tf_idf_test3)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
    pred = clf.predict(tf_idf_test3)
    roc_auc=roc_auc_score(polarity_test,preds ,average='macro')
    print(classification_report(polarity_test,pred))
    print(confusion_matrix(polarity_test,pred))
    gmean=geometric_mean_score(polarity_test,pred,average='macro')
    f1=f1_score(polarity_test, pred, average='macro')

0 个答案:

没有答案