搜索最佳算法时的sklearn值错误

时间:2017-11-07 14:38:35

标签: python-3.x machine-learning

我基本上是从this网站学习机器学习。我的目标是预测最后一列(ef)的第一列(momd1)。

我使用的数据集如下:

 momd1,momd2,momo1,momc1,momc5,e,ef
3.246  ,3.246, 0.103 , -0.146 ,-0.146, -53.620459,-0.24790180
-3.243 ,3.243, -0.004, 0.003  ,-0.003, -54.49432 ,-0.33294370
2.836  ,2.836, -0.093, 0.007  ,-0.013, -50.264634,0.38720018
-3.223 ,3.223, -0.008, 0.007  ,-0.007, -53.701018,0.24682663
3.193  ,3.193, 0.022 , -0.103 ,-0.103, -52.908454,3.99073253
-3.202 ,3.202, -0.006, 0.016  ,-0.016, -52.904975,3.64920681
3.296  ,3.296, 0.027 , -0.124 ,-0.124, -56.155827,1.16850860
-3.232 ,3.232, -0.0  , -0.001 ,0.001 , -56.132135,0.88270867
3.256  ,3.256, 0.035 , -0.121 ,-0.121, -52.992831,1.34394255
-3.188 ,3.188, -0.003, 0.004  ,-0.004, -52.92053 ,1.10366443
3.207  ,3.207, 0.029 , -0.123 ,-0.123, -48.313344,4.88599858
-3.201 ,3.201, -0.005, 0.017  ,-0.017, -48.257162,4.86692039
3.22   ,3.22 , 0.058 , -0.115 ,-0.115, -53.440177,1.25961953
-3.13  ,3.13 , -0.001, -0.001 ,0.001 , -53.41947 ,0.81931871
3.221  ,3.221, 0.065 , -0.123 ,-0.123, -50.464766,1.28359085
-3.151 ,3.151, -0.002, 0.001  ,-0.001, -50.371872,1.06968415
3.196  ,3.196, 0.039 , -0.128 ,-0.127, -45.900099,4.87386305
-3.187 ,3.187, -0.007, 0.02   ,-0.02 , -45.843975,4.87962399
2.912  ,2.912, 0.292 , -0.141 ,-0.141, -53.366253,1.43430369
-2.835 ,2.835, -0.01 , -0.001 ,0.001 , -53.036974,1.48409671

我正在尝试检查最佳算法:

#!/usr/bin/env python3
import pandas
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC

# Load data
dataset = pandas.read_csv("Results.dat")
print(dataset.head(20))
# descriptions
print(dataset.describe())
# box and whisker plots
dataset.hist()
plt.show()

# Split-out validation dataset
array = dataset.values
X = array[:,0:4]
Y = array[:,4]
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
# Test options and evaluation metric
seed = 7
scoring = 'accuracy'

# Spot Check Algorithms
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))
# evaluate each model in turn
results = []
names = []
for name, model in models:
  kfold = model_selection.KFold(n_splits=10, random_state=seed)
  cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
  results.append(cv_results)
  names.append(name)
  msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
  print(msg)

# Compare Algorithms
fig = plt.figure()
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results)
ax.set_xticklabels(names)
plt.show()

这给了我错误:

Traceback (most recent call last):
  File "ml.py", line 49, in <module>
    cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
  File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_validation.py", line 342, in cross_val_score
    pre_dispatch=pre_dispatch)
  File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_validation.py", line 206, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_validation.py", line 458, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib64/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1217, in fit
    check_classification_targets(y)
  File "/usr/local/lib64/python3.6/site-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

这可能与this类似,但我无法解决。

鉴于我正在学习,请帮我解决这个问题。

编辑类似

的内容
momd1,momd2,momo1,momc1,momc5,e,ef,class
3.246  ,3.246, 0.103 , -0.146 ,-0.146, -53.620459,-0.2479018,1
-3.243 ,3.243, -0.004, 0.003  ,-0.003, -54.49432 ,-0.3329437,2
2.836  ,2.836, -0.093, 0.007  ,-0.013, -50.264634,0.38720018,1
-3.223 ,3.223, -0.008, 0.007  ,-0.007, -53.701018,0.24682663,2

仍然会出现同样的错误。

1 个答案:

答案 0 :(得分:-1)

它表示您的标签类型是连续的。

这意味着你有一些像真实数字(0.5,1.3333,8.001 ......)的东西,而不是你标签上规定数量的类。

您需要 将标签转换为定义数量的类,如(1,2,3,4),只有这些类才能使用回归而不是分类。回归确实会导致连续数字而不是

修改 我没有运行你的代码,所以可能还有别的东西,但这是错误信息告诉你的内容