如何计算多类分类的加权准确度?

时间:2017-10-19 15:54:33

标签: python scikit-learn classification

我对不平衡的类进行多类分类。我正在SGDClassifier(), GradientBoostingClassifier(), RandomForestClassifier(), and LogisticRegression()使用class_weight='balanced'。比较结果。需要计算准确度。我尝试了以下方法来计算加权准确度:

n_samples = len(y_train)
weights_cof = float(n_samples)/(n_classes*np.bincount(data[target_label].as_matrix().astype(int))[1:])
sample_weights = np.ones((n_samples,n_classes)) * weights_cof
print accuracy_score(y_test, y_pred, sample_weight=sample_weights)

y_train是二进制数组。因此sample_weights具有与y_trainn_samples, n_classes)相同的形状。当我运行脚本时,我收到以下错误:

更新:

 Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 424, in <module>
    predict_country(featuresDF, score, featuresLabel, country_sample_size, 'gbc')
  File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 313, in predict_country
    print accuracy_score(y_test, y_pred, sample_weight=sample_weights)
  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 183, in accuracy_score
    return _weighted_sum(score, sample_weight, normalize)
  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 108, in _weighted_sum
    return np.average(sample_score, weights=sample_weight)
  File "C:\ProgramData\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 1124, in average
    "Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.

1 个答案:

答案 0 :(得分:0)

该错误似乎表明您的sample_weights和y_test / y_pred数组的形状不同。基本上该方法创建一个带有y_test == y_pred的布尔数组,并将其与sample_weights一起传递给np.average。该方法的第一个检查之一是确保输入的数组和权重是相同的形状,显然在这种情况下它们不是。

更新

您的评论&#34; sample_weights,y_test和y_pred具有相同的形状(n_samples,n_classes)&#34;暴露了这个问题。根据{{​​3}}的文档,y_predy_true(在您的情况下为y_testy_pred)应为1维。您是否正在使用一个热编码标签?如果是这样,您应该将它们转换为单值标签,然后再次尝试准确度分数。