我正在对数据进行多类分类,形状为(299,6),标签的形状为(299,5)。这是我拥有的数据示例
[[0.004873972,0.069813839,-0.470500136,2.285885634,0.5335,0.052915143],
[0.001698812,0.041216647,-0.01333925,2.507806584,0.2332,0.123463255],
[0.005954432,0.077164967,4.749752766,26.45721079,0.1663,0.186452725],
[0.001792197,0.042334345,-0.176201652,1.9656153,0.4001,0.087055596],
[0.001966929,0.044350068,0.182059972,1.610369693,0.55,0.29675874]]
以下是此csv文件中存储的数据[[1,0,0,0,0],[0,0,0,1,0],[0,0,0,1,0],[0,0,1,0,0],[0,1,0,0,0]]
的数据标签的标签。
我尝试了svm和logistic回归,但给了我ValueError错误:输入形状错误(299,5),该错误在标签中,但我如何解决此问题。
[sample dataset][1]
[1]: https://i.stack.imgur.com/Wncqy.png
答案 0 :(得分:0)
您可以将其作为标准分类任务运行,将一字尾转换为标签并训练SVM分类器,请参见示例代码:
import numpy as np
from sklearn.svm import SVC
data = np.array([[0.004873972,0.069813839,-0.470500136,2.285885634,0.5335,0.052915143],
[0.001698812,0.041216647,-0.01333925,2.507806584,0.2332,0.123463255],
[0.005954432,0.077164967,4.749752766,26.45721079,0.1663,0.186452725],
[0.001792197,0.042334345,-0.176201652,1.9656153,0.4001,0.087055596],
[0.001966929,0.044350068,0.182059972,1.610369693,0.55,0.29675874]])
outputs = np.array([[1,0,0,0,0],[0,0,0,1,0],[0,0,0,1,0],[0,0,1,0,0],[0,1,0,0,0]])
labels = np.argmax(outputs, axis=0)
clf = SVC()
clf.fit(data, labels)
print(clf.score(data, labels))
# 0.6
要进行参数调整,请查看Hyperparameter Tuning the Random Forest in Python和Comparing randomized search and grid search for hyperparameter estimation