所以我正在尝试构建一个分类器并对其性能进行评分。这是我的代码:
def svc(train_data, train_labels, test_data, test_labels):
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
svc = SVC(kernel='linear')
svc.fit(train_data, train_labels)
predicted = svc.predict(test_data)
actual = test_labels
score = svc.score(test_data, test_labels)
print ('svc score')
print (score)
print ('svc accuracy')
print (accuracy_score(predicted, actual))
现在我用:
运行函数svc(X,x,Y,y)X.shape = (1000, 150)
x.shape = (1000, )
Y.shape = (200, 150)
y.shape = (200, )
我收到错误:
6 predicted = svc.predict(test_classed_data)
7 actual = test_classed_labels
----> 8 score = svc.score(test_classed_data, test_classed_labels)
9 print ('svc score')
10 print (score)
local/lib/python3.4/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
289 """
290 from .metrics import accuracy_score
--> 291 return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
292
293
124 if (y_type not in ["binary", "multiclass", "multilabel-indicator",
125 "multilabel-sequences"]):
--> 126 raise ValueError("{0} is not supported".format(y_type))
127
128 if y_type in ["binary", "multiclass"]:
ValueError: continuous is not supported
我的test_labels或y的格式为:
[ 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 20.5
20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 25.5 25.5
25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 30.5 30.5 30.5
30.5 30.5 30.5 30.5 30.5 30.5 30.5 30.5 35.5 35.5 35.5 35.5
35.5 35.5 35.5 35.5 35.5 35.5 35.5... ]
我真的很困惑,为什么当我看过的所有示例都有类似的格式来开采和工作时,SVC不会将这些识别为离散标签。请帮忙。
答案 0 :(得分:5)
y
和fit
函数中的score
应为整数或字符串,表示类标签。
E.g。如果你有两个课程"foo"
和1
,你可以像这样训练一个SVM:
>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> X = np.random.randn(10, 4)
>>> y = ["foo"] * 5 + [1] * 5
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
然后用
测试其准确性>>> X_test = np.random.randn(6, 4)
>>> y_test = ["foo", 1] * 3
>>> clf.score(X_test, y_test)
0.5
fit
显然仍然接受浮点值,但它们不应该是,因为类标签不应该是实际值。
答案 1 :(得分:1)
来自http://scikit-learn.org/stable/modules/svm.html#classification的SVM的scikit-learn文档:
"与其他分类器一样,SVC,NuSVC和LinearSVC将两个数组作为输入:大小为[n_samples,n_features]的数组X保存训练样本,数组Y为整数值"
将标签数组转换为int,或者如果过于简单(例如1.6和1.8将转换为相同的值),则为每个唯一的浮点值指定一个整数类标签。
不确定为什么fit
和predict
方法不会抛出错误。