多类分类不良值错误

时间:2019-01-09 13:14:07

标签: python multiclass-classification

我正在对数据进行多类分类,形状为(299,6),标签的形状为(299,5)。这是我拥有的数据示例

[[0.004873972,0.069813839,-0.470500136,2.285885634,0.5335,0.052915143],
[0.001698812,0.041216647,-0.01333925,2.507806584,0.2332,0.123463255],
[0.005954432,0.077164967,4.749752766,26.45721079,0.1663,0.186452725],
[0.001792197,0.042334345,-0.176201652,1.9656153,0.4001,0.087055596],
[0.001966929,0.044350068,0.182059972,1.610369693,0.55,0.29675874]]

以下是此csv文件中存储的数据[[1,0,0,0,0],[0,0,0,1,0],[0,0,0,1,0],[0,0,1,0,0],[0,1,0,0,0]]的数据标签的标签。

我尝试了svm和logistic回归,但给了我ValueError错误:输入形状错误(299,5),该错误在标签中,但我如何解决此问题。

[sample dataset][1]
  [1]: https://i.stack.imgur.com/Wncqy.png

1 个答案:

答案 0 :(得分:0)

您可以将其作为标准分类任务运行,将一字尾转换为标签并训练SVM分类器,请参见示例代码:

import numpy as np
from sklearn.svm import SVC

data = np.array([[0.004873972,0.069813839,-0.470500136,2.285885634,0.5335,0.052915143],
                 [0.001698812,0.041216647,-0.01333925,2.507806584,0.2332,0.123463255],
                 [0.005954432,0.077164967,4.749752766,26.45721079,0.1663,0.186452725],
                 [0.001792197,0.042334345,-0.176201652,1.9656153,0.4001,0.087055596],
                 [0.001966929,0.044350068,0.182059972,1.610369693,0.55,0.29675874]])
outputs = np.array([[1,0,0,0,0],[0,0,0,1,0],[0,0,0,1,0],[0,0,1,0,0],[0,1,0,0,0]])
labels = np.argmax(outputs, axis=0)

clf = SVC()
clf.fit(data, labels)
print(clf.score(data, labels))
# 0.6

要进行参数调整,请查看Hyperparameter Tuning the Random Forest in PythonComparing randomized search and grid search for hyperparameter estimation