运行model_selection.cross_val_score

时间:2017-05-09 08:28:42

标签: python scikit-learn

我有这个测试部分并且效果很好:

data = pd.read_csv('/home/noodle/B_train.csv')
print(data.head())
features = data.iloc[:, :-1].as_matrix()
targets = data.iloc[:, -1:].as_matrix()
targets = targets.reshape(-1)
print(targets.shape, utils.multiclass.type_of_target(targets))
clf = tree.DecisionTreeClassifier(max_depth=5)
scores = model_selection.cross_val_score(clf, features, targets)
print(scores)

目标的形状是(115,),'type_of_target'是二进制...... 这是数据的负责人:

   no  x   y  z  m     k  l  t
0  17  1   4  1  1  1020  1  1
1  17  1  10  2  1  1037  2  1
2  18  1   5  1  1  1512  3  1
3  18  1   2  0  1  1440  1  1
4  15  1   4  1  1   465  1  1

问题出现了: 当我运行其他代码时,它会引发错误:

File "/home/noodle/PycharmProjects/qh/dc_tree.py", line 61, in find_common
    scores = model_selection.cross_val_score(clf, features, labels, cv=5)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_validation.py", line 130, in cross_val_score
    cv = check_cv(cv, y, classifier=is_classifier(estimator))
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_split.py", line 1584, in check_cv
    (type_of_target(y) in ('binary', 'multiclass'))):
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/utils/multiclass.py", line 237, in type_of_target
    if is_multilabel(y):
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/utils/multiclass.py", line 153, in is_multilabel
    labels = np.unique(y)
  File "/usr/local/python34/lib/python3.4/site-packages/numpy/lib/arraysetops.py", line 214, in unique
    ar.sort()
TypeError: unorderable types: str() > float()

以下是代码和数据头

data = data.as_matrix()
labels = data[:, 0]
features = data[:, 1:]
print(labels.shape, utils.multiclass.type_of_target(labels))
clf = RandomForestClassifier(n_estimators=i, max_depth=None,
                             min_samples_split=2, random_state=0)
scores = model_selection.cross_val_score(clf, features, labels, cv=5)  

工作数据头:

    flag UserInfo_1   UserInfo_2   UserInfo_3 UserInfo_4 ProductInfo_1  
0     0    missing  5226.590000     0.000000        0.0           0.0   
1     0    missing     0.000000     0.000000        0.0           0.0   
2     0    missing  5272.206555  2412.077228    missing       missing   
3     0    missing  5272.206555  2412.077228    missing       missing   
4     0    missing  5272.206555  2412.077228    missing       missing   

标签的形状是(4000,),'type_of_target'是二进制。标签和目标之间(在测试部分中)似乎没有差异,除了第一维中的形状。所以我认为这可能是因为str的功能......我不想首先得到我的工作数据。所以我尝试将测试数据更改为:

   no  x   y  z  m     k  l  t
0  17  g   4  1  1  1020  1  1
1  17  g  10  2  1  1037  2  1
2  18  g   5  1  1  1512  3  1
3  18  g   2  0  1  1440  1  1
4  15  g   4  1  1   465  1  1

并运行它以找出错误,但它引发了另一个不同的错误:

Traceback (most recent call last):
  File "/home/noodle/PycharmProjects/bigtest/tensortest.py", line 71, in <module>
    scores = model_selection.cross_val_score(clf, features, targets1)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_validation.py", line 140, in cross_val_score
    for train, test in cv_iter)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
    self.results = batch()
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_validation.py", line 238, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/tree/tree.py", line 739, in fit
    X_idx_sorted=X_idx_sorted)
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/tree/tree.py", line 122, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/usr/local/python34/lib/python3.4/site-packages/sklearn/utils/validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'n' 

因此工作部分引发的错误不是由工作数据中的str引起的......对吗?我该如何解决?

0 个答案:

没有答案