ValueError未知标签类型数组sklearn-load_boston

时间:2016-08-06 09:09:58

标签: python scikit-learn

我使用以下代码检查SGDClassifier

import numpy as np
from sklearn.datasets import load_boston
from sklearn.linear_model import SGDClassifier
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split

data = load_boston()
x_train, x_test, y_train, y_test = train_test_split(data.data, data.target)

x_scalar = StandardScaler()
y_scalar = StandardScaler()

x_train = x_scalar.fit_transform(x_train)
y_train = y_scalar.fit_transform(y_train)
x_test = x_scalar.transform(x_test)
y_test = y_scalar.transform(y_test)

regressor = SGDClassifier(loss='squared_loss')
scores = cross_val_score(regressor, x_train, y_train, cv=5)
print  'cross validation r scores ', scores
print 'average score ', np.mean(scores)
regressor.fit_transform(x_train, y_train)
print 'test set r score ', regressor.score(x_test,y_test)

然而,当我运行它时,我会收到弃用警告重塑和 以下值错误

ValueError                                Traceback (most recent call last)
<ipython-input-55-4d64d112f5db> in <module>()
     18 
     19 regressor = SGDClassifier(loss='squared_loss')
---> 20 scores = cross_val_score(regressor, x_train, y_train, cv=5)

ValueError: Unknown label type: (array([ -1.89568750e+00,  -1.75715217e+00,  -1.68255622e+00,
        -1.66124309e+00,  -1.62927339e+00,  -1.54402088e+00,
        -1.49073806e+00,  -1.41614211e+00,  -1.40548554e+00,
        -1.34154616e+00,  -1.32023303e+00,  -1.30957647e+00,
        -1.27760677e+00,  -1.26695021e+00,  -1.25629365e+00,
        -1.20301082e+00,  -1.17104113e+00,  -1.16038457e+00,....]),)

代码中可能出现的错误是什么?

1 个答案:

答案 0 :(得分:3)

在分类任务中,因变量(或目标)是分类的。例如,我们试图预测索赔是否是欺诈性的。另一方面,在回归中,因变量是数值。它可以测量。

在波士顿住房数据集中,因变量“自有住房的中位数价值为1000美元”(您可以通过执行print(data.DESCR)来查看说明)。它是一个连续变量,不能用分类器预测。

如果要测试分类器,可以使用其他数据集。例如,将load_boston()更改为load_iris()。请注意,您还需要删除目标变量的变换 - 它用于数值变量。通过这些修改,它应该可以正常工作。

import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split

data = load_iris()
x_train, x_test, y_train, y_test = train_test_split(data.data, data.target)

x_scalar = StandardScaler()

x_train = x_scalar.fit_transform(x_train)
x_test = x_scalar.transform(x_test)

classifier = SGDClassifier(loss='squared_loss')
scores = cross_val_score(classifier, x_train, y_train, cv=5)


scores
Out: array([ 0.33333333,  0.2173913 ,  0.31818182,  0.        ,  0.19047619])