现在已经被这个虫子困扰了一段时间,并且可以使用一些蜂巢头脑帮助(希望)抓住我失踪的东西。
tl; dr version
当我通过scikit-learn GridSearchCV ndarrays of floats进行训练时,Pandas会引发一个TypeError。
完整版
我使用sklearn GridSearchCV对象将一维浮动目标变量列表(在0.0和1.0之间缩放)拟合到浮点特征变量(输入变量)的2D列表中,该列表也在0和1之间缩放。两个列表都有相同数量的样本。
我将训练数据传递给GridSearchCV.fit()的缩写代码如下:
# the training data are attributes of a model class defined elsewhere
model.X_scaled # ndarray of feature data of shape (N_samples, N_features)
model.Y_scaled # ndarray target data of length N_samples
# setup the GridSearchCV instance
search = grid_search.GridSearchCV(estimator = svr,
param_grid = params, # C and epsilon parameters are set elsewhere in a dict
n_jobs = self.parallel_processes,
scoring = self.scoring_metric,
cv = np.shape(self.Y)[0], # sets the fold size for cross-val. cv = # of samples is essentially LOO CV.
verbose = 0)
# print out some info to get a better idea of the training data
print "model.X_scaled"
print type(model.X_scaled[0][0]) # this is getting the type of one of the elements of X_scaled
print np.shape(model.X_scaled)
print model.X_scaled.tolist()
print "model.Y_scaled"
print type(model.Y_scaled) # this is getting the type of the entire arry structure
print np.shape(model.Y_scaled)
print model.Y_scaled
# fit GridSearchCV on the training data
search.fit(model.X_scaled, model.Y_scaled)
运行此代码时,我得到结果输出:
model.X_scaled
<type 'numpy.float64'>
(81, 16)
model.Y_scaled
<type 'numpy.ndarray'>
(81,)
Traceback (most recent call last):
File "SurrogateModel.py", line 416, in <module>
self_test()
File "SurrogateModel.py", line 409, in self_test
model.train()
File "SurrogateModel.py", line 350, in train
search.fit(model.X_scaled, model.Y_scaled)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/grid_search.py", line 804, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/grid_search.py", line 553, in _fit
for parameters in parameter_iterable
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 800, in __call__
while self.dispatch_one_batch(iterator):
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 180, in __init__
self.results = batch()
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/svm/base.py", line 193, in fit
fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/svm/base.py", line 251, in _dense_fit
max_iter=self.max_iter, random_seed=random_seed)
File "sklearn/svm/libsvm.pyx", line 59, in sklearn.svm.libsvm.fit (sklearn/svm/libsvm.c:1576)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/pandas/core/series.py", line 78, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
堆栈跟踪对我来说有点混乱。 sklearn让它一直到#34; fit&#34;调用libsvm,但是然后Pandas包引发了一个TypeError,因为它无法将一个系列对象转换为一个浮点数?我查看了pandas series.py模块,并找到了引发TypeError的其他上下文:
# in pandas/core/series.py
def _coerce_method(converter):
""" install the scalar coercion methods """
def wrapper(self):
if len(self) == 1:
return converter(self.iloc[0])
raise TypeError("cannot convert the series to "
"{0}".format(str(converter)))
return wrapper
当我传递给scikit-learn的数据结构都不是pandas对象时,我不明白如何调用这个Pandas函数。在我提供的代码中,它们都是ndarray,但我尝试将它们作为普通列表传递,以获得相同的TypeError。在代码中较早使用的Pandas 用于将数据从csv读取到DataFrame中,但数据在被放入X_scaled和Y_scaled之前会转换为ndarrays。
令人讨厌的是,这个几乎完全相同的代码在此脚本之前的脚本中运行得非常好。我遇到此问题的代码版本基本上是从脚本重构的,但是此功能在训练数据上训练网格搜索对象基本保持不变。
对于此处可能发生的事情的任何建议都非常感谢。谢谢!