我有这个测试部分并且效果很好:
data = pd.read_csv('/home/noodle/B_train.csv')
print(data.head())
features = data.iloc[:, :-1].as_matrix()
targets = data.iloc[:, -1:].as_matrix()
targets = targets.reshape(-1)
print(targets.shape, utils.multiclass.type_of_target(targets))
clf = tree.DecisionTreeClassifier(max_depth=5)
scores = model_selection.cross_val_score(clf, features, targets)
print(scores)
目标的形状是(115,),'type_of_target'是二进制...... 这是数据的负责人:
no x y z m k l t
0 17 1 4 1 1 1020 1 1
1 17 1 10 2 1 1037 2 1
2 18 1 5 1 1 1512 3 1
3 18 1 2 0 1 1440 1 1
4 15 1 4 1 1 465 1 1
问题出现了: 当我运行其他代码时,它会引发错误:
File "/home/noodle/PycharmProjects/qh/dc_tree.py", line 61, in find_common
scores = model_selection.cross_val_score(clf, features, labels, cv=5)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_validation.py", line 130, in cross_val_score
cv = check_cv(cv, y, classifier=is_classifier(estimator))
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_split.py", line 1584, in check_cv
(type_of_target(y) in ('binary', 'multiclass'))):
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/utils/multiclass.py", line 237, in type_of_target
if is_multilabel(y):
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/utils/multiclass.py", line 153, in is_multilabel
labels = np.unique(y)
File "/usr/local/python34/lib/python3.4/site-packages/numpy/lib/arraysetops.py", line 214, in unique
ar.sort()
TypeError: unorderable types: str() > float()
以下是代码和数据头
data = data.as_matrix()
labels = data[:, 0]
features = data[:, 1:]
print(labels.shape, utils.multiclass.type_of_target(labels))
clf = RandomForestClassifier(n_estimators=i, max_depth=None,
min_samples_split=2, random_state=0)
scores = model_selection.cross_val_score(clf, features, labels, cv=5)
工作数据头:
flag UserInfo_1 UserInfo_2 UserInfo_3 UserInfo_4 ProductInfo_1
0 0 missing 5226.590000 0.000000 0.0 0.0
1 0 missing 0.000000 0.000000 0.0 0.0
2 0 missing 5272.206555 2412.077228 missing missing
3 0 missing 5272.206555 2412.077228 missing missing
4 0 missing 5272.206555 2412.077228 missing missing
标签的形状是(4000,),'type_of_target'是二进制。标签和目标之间(在测试部分中)似乎没有差异,除了第一维中的形状。所以我认为这可能是因为str的功能......我不想首先得到我的工作数据。所以我尝试将测试数据更改为:
no x y z m k l t
0 17 g 4 1 1 1020 1 1
1 17 g 10 2 1 1037 2 1
2 18 g 5 1 1 1512 3 1
3 18 g 2 0 1 1440 1 1
4 15 g 4 1 1 465 1 1
并运行它以找出错误,但它引发了另一个不同的错误:
Traceback (most recent call last):
File "/home/noodle/PycharmProjects/bigtest/tensortest.py", line 71, in <module>
scores = model_selection.cross_val_score(clf, features, targets1)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_validation.py", line 140, in cross_val_score
for train, test in cv_iter)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
while self.dispatch_one_batch(iterator):
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
result = ImmediateResult(func)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
self.results = batch()
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/model_selection/_validation.py", line 238, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/tree/tree.py", line 739, in fit
X_idx_sorted=X_idx_sorted)
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/tree/tree.py", line 122, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "/usr/local/python34/lib/python3.4/site-packages/sklearn/utils/validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'n'
因此工作部分引发的错误不是由工作数据中的str引起的......对吗?我该如何解决?