Question

所以我正在尝试构建一个分类器并对其性能进行评分。这是我的代码：

def svc(train_data, train_labels, test_data, test_labels):
    from sklearn.svm import SVC
    from sklearn.metrics import accuracy_score
    svc = SVC(kernel='linear')
    svc.fit(train_data, train_labels)
    predicted = svc.predict(test_data)
    actual = test_labels
    score = svc.score(test_data, test_labels)
    print ('svc score')
    print (score)
    print ('svc accuracy')
    print (accuracy_score(predicted, actual))

现在我用：

运行函数svc（X，x，Y，y）

X.shape = (1000, 150)    
x.shape = (1000, )   
Y.shape = (200, 150)   
y.shape = (200, )

我收到错误：

      6     predicted = svc.predict(test_classed_data)
      7     actual = test_classed_labels
----> 8     score = svc.score(test_classed_data, test_classed_labels)
      9     print ('svc score')
     10     print (score)

local/lib/python3.4/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
    289         """
    290         from .metrics import accuracy_score
--> 291         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
    292 
    293 

    124     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
    125                        "multilabel-sequences"]):
--> 126         raise ValueError("{0} is not supported".format(y_type))
    127 
    128     if y_type in ["binary", "multiclass"]:

ValueError: continuous is not supported

我的test_labels或y的格式为：

[ 15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  20.5
  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  25.5  25.5
  25.5  25.5  25.5  25.5  25.5  25.5  25.5  25.5  25.5  30.5  30.5  30.5
  30.5  30.5  30.5  30.5  30.5  30.5  30.5  30.5  35.5  35.5  35.5  35.5
  35.5  35.5  35.5  35.5  35.5  35.5  35.5... ]

我真的很困惑，为什么当我看过的所有示例都有类似的格式来开采和工作时，SVC不会将这些识别为离散标签。请帮忙。

Answer 1

y和fit函数中的score应为整数或字符串，表示类标签。

E.g。如果你有两个课程"foo"和1，你可以像这样训练一个SVM：

>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> X = np.random.randn(10, 4)
>>> y = ["foo"] * 5 + [1] * 5
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

然后用

测试其准确性

>>> X_test = np.random.randn(6, 4)
>>> y_test = ["foo", 1] * 3
>>> clf.score(X_test, y_test)
0.5

fit显然仍然接受浮点值，但它们不应该是，因为类标签不应该是实际值。

Answer 2

来自http://scikit-learn.org/stable/modules/svm.html#classification的SVM的scikit-learn文档：

＆＃34;与其他分类器一样，SVC，NuSVC和LinearSVC将两个数组作为输入：大小为[n_samples，n_features]的数组X保存训练样本，数组Y为整数值＆＃34;

将标签数组转换为int，或者如果过于简单（例如1.6和1.8将转换为相同的值），则为每个唯一的浮点值指定一个整数类标签。

不确定为什么fit和predict方法不会抛出错误。

sklearn的SVC评分方法需要什么样的输入？

2 个答案: