我基本上是从this网站学习机器学习。我的目标是预测最后一列(ef)的第一列(momd1)。
我使用的数据集如下:
momd1,momd2,momo1,momc1,momc5,e,ef
3.246 ,3.246, 0.103 , -0.146 ,-0.146, -53.620459,-0.24790180
-3.243 ,3.243, -0.004, 0.003 ,-0.003, -54.49432 ,-0.33294370
2.836 ,2.836, -0.093, 0.007 ,-0.013, -50.264634,0.38720018
-3.223 ,3.223, -0.008, 0.007 ,-0.007, -53.701018,0.24682663
3.193 ,3.193, 0.022 , -0.103 ,-0.103, -52.908454,3.99073253
-3.202 ,3.202, -0.006, 0.016 ,-0.016, -52.904975,3.64920681
3.296 ,3.296, 0.027 , -0.124 ,-0.124, -56.155827,1.16850860
-3.232 ,3.232, -0.0 , -0.001 ,0.001 , -56.132135,0.88270867
3.256 ,3.256, 0.035 , -0.121 ,-0.121, -52.992831,1.34394255
-3.188 ,3.188, -0.003, 0.004 ,-0.004, -52.92053 ,1.10366443
3.207 ,3.207, 0.029 , -0.123 ,-0.123, -48.313344,4.88599858
-3.201 ,3.201, -0.005, 0.017 ,-0.017, -48.257162,4.86692039
3.22 ,3.22 , 0.058 , -0.115 ,-0.115, -53.440177,1.25961953
-3.13 ,3.13 , -0.001, -0.001 ,0.001 , -53.41947 ,0.81931871
3.221 ,3.221, 0.065 , -0.123 ,-0.123, -50.464766,1.28359085
-3.151 ,3.151, -0.002, 0.001 ,-0.001, -50.371872,1.06968415
3.196 ,3.196, 0.039 , -0.128 ,-0.127, -45.900099,4.87386305
-3.187 ,3.187, -0.007, 0.02 ,-0.02 , -45.843975,4.87962399
2.912 ,2.912, 0.292 , -0.141 ,-0.141, -53.366253,1.43430369
-2.835 ,2.835, -0.01 , -0.001 ,0.001 , -53.036974,1.48409671
我正在尝试检查最佳算法:
#!/usr/bin/env python3
import pandas
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
# Load data
dataset = pandas.read_csv("Results.dat")
print(dataset.head(20))
# descriptions
print(dataset.describe())
# box and whisker plots
dataset.hist()
plt.show()
# Split-out validation dataset
array = dataset.values
X = array[:,0:4]
Y = array[:,4]
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
# Test options and evaluation metric
seed = 7
scoring = 'accuracy'
# Spot Check Algorithms
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))
# evaluate each model in turn
results = []
names = []
for name, model in models:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
print(msg)
# Compare Algorithms
fig = plt.figure()
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results)
ax.set_xticklabels(names)
plt.show()
这给了我错误:
Traceback (most recent call last):
File "ml.py", line 49, in <module>
cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_validation.py", line 342, in cross_val_score
pre_dispatch=pre_dispatch)
File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_validation.py", line 206, in cross_validate
for train, test in cv.split(X, y, groups))
File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
while self.dispatch_one_batch(iterator):
File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
result = ImmediateResult(func)
File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
self.results = batch()
File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib64/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_validation.py", line 458, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib64/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1217, in fit
check_classification_targets(y)
File "/usr/local/lib64/python3.6/site-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
这可能与this类似,但我无法解决。
鉴于我正在学习,请帮我解决这个问题。
编辑类似
的内容momd1,momd2,momo1,momc1,momc5,e,ef,class
3.246 ,3.246, 0.103 , -0.146 ,-0.146, -53.620459,-0.2479018,1
-3.243 ,3.243, -0.004, 0.003 ,-0.003, -54.49432 ,-0.3329437,2
2.836 ,2.836, -0.093, 0.007 ,-0.013, -50.264634,0.38720018,1
-3.223 ,3.223, -0.008, 0.007 ,-0.007, -53.701018,0.24682663,2
仍然会出现同样的错误。
答案 0 :(得分:-1)
它表示您的标签类型是连续的。
这意味着你有一些像真实数字(0.5,1.3333,8.001 ......)的东西,而不是你标签上规定数量的类。
您需要 将标签转换为定义数量的类,如(1,2,3,4),只有这些类或才能使用回归而不是分类。回归确实会导致连续数字而不是类。
修改强> 我没有运行你的代码,所以可能还有别的东西,但这是错误信息告诉你的内容