我使用以下代码检查SGDClassifier
import numpy as np
from sklearn.datasets import load_boston
from sklearn.linear_model import SGDClassifier
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
data = load_boston()
x_train, x_test, y_train, y_test = train_test_split(data.data, data.target)
x_scalar = StandardScaler()
y_scalar = StandardScaler()
x_train = x_scalar.fit_transform(x_train)
y_train = y_scalar.fit_transform(y_train)
x_test = x_scalar.transform(x_test)
y_test = y_scalar.transform(y_test)
regressor = SGDClassifier(loss='squared_loss')
scores = cross_val_score(regressor, x_train, y_train, cv=5)
print 'cross validation r scores ', scores
print 'average score ', np.mean(scores)
regressor.fit_transform(x_train, y_train)
print 'test set r score ', regressor.score(x_test,y_test)
然而,当我运行它时,我会收到弃用警告重塑和 以下值错误
ValueError Traceback (most recent call last)
<ipython-input-55-4d64d112f5db> in <module>()
18
19 regressor = SGDClassifier(loss='squared_loss')
---> 20 scores = cross_val_score(regressor, x_train, y_train, cv=5)
ValueError: Unknown label type: (array([ -1.89568750e+00, -1.75715217e+00, -1.68255622e+00,
-1.66124309e+00, -1.62927339e+00, -1.54402088e+00,
-1.49073806e+00, -1.41614211e+00, -1.40548554e+00,
-1.34154616e+00, -1.32023303e+00, -1.30957647e+00,
-1.27760677e+00, -1.26695021e+00, -1.25629365e+00,
-1.20301082e+00, -1.17104113e+00, -1.16038457e+00,....]),)
代码中可能出现的错误是什么?
答案 0 :(得分:3)
在分类任务中,因变量(或目标)是分类的。例如,我们试图预测索赔是否是欺诈性的。另一方面,在回归中,因变量是数值。它可以测量。
在波士顿住房数据集中,因变量“自有住房的中位数价值为1000美元”(您可以通过执行print(data.DESCR)
来查看说明)。它是一个连续变量,不能用分类器预测。
如果要测试分类器,可以使用其他数据集。例如,将load_boston()
更改为load_iris()
。请注意,您还需要删除目标变量的变换 - 它用于数值变量。通过这些修改,它应该可以正常工作。
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
data = load_iris()
x_train, x_test, y_train, y_test = train_test_split(data.data, data.target)
x_scalar = StandardScaler()
x_train = x_scalar.fit_transform(x_train)
x_test = x_scalar.transform(x_test)
classifier = SGDClassifier(loss='squared_loss')
scores = cross_val_score(classifier, x_train, y_train, cv=5)
scores
Out: array([ 0.33333333, 0.2173913 , 0.31818182, 0. , 0.19047619])