在安装Scikit-Learn GridSearchCV时由Pandas模块引发的TypError

时间:2016-09-06 19:35:43

标签: python pandas scikit-learn anaconda typeerror

现在已经被这个虫子困扰了一段时间,并且可以使用一些蜂巢头脑帮助(希望)抓住我失踪的东西。

tl; dr version

当我通过scikit-learn GridSearchCV ndarrays of floats进行训练时,Pandas会引发一个TypeError。

完整版

我使用sklearn GridSearchCV对象将一维浮动目标变量列表(在0.0和1.0之间缩放)拟合到浮点特征变量(输入变量)的2D列表中,该列表也在0和1之间缩放。两个列表都有相同数量的样本。

我将训练数据传递给GridSearchCV.fit()的缩写代码如下:

# the training data are attributes of a model class defined elsewhere
model.X_scaled # ndarray of feature data of shape (N_samples, N_features)
model.Y_scaled # ndarray target data of length N_samples

# setup the GridSearchCV instance
search = grid_search.GridSearchCV(estimator = svr,
                                          param_grid = params, # C and epsilon parameters are set elsewhere in a dict
                                          n_jobs = self.parallel_processes,
                                          scoring = self.scoring_metric,
                                          cv = np.shape(self.Y)[0], # sets the fold size for cross-val. cv = # of samples is essentially LOO CV.
                                          verbose = 0)

# print out some info to get a better idea of the training data
print "model.X_scaled"
print type(model.X_scaled[0][0]) # this is getting the type of one of the elements of X_scaled
print np.shape(model.X_scaled)
print model.X_scaled.tolist()

print "model.Y_scaled"
print type(model.Y_scaled) # this is getting the type of the entire arry structure
print np.shape(model.Y_scaled)
print model.Y_scaled

# fit GridSearchCV on the training data
search.fit(model.X_scaled, model.Y_scaled)

运行此代码时,我得到结果输出:

model.X_scaled
<type 'numpy.float64'>
(81, 16)
model.Y_scaled
<type 'numpy.ndarray'>
(81,)
Traceback (most recent call last):
  File "SurrogateModel.py", line 416, in <module>
    self_test()
  File "SurrogateModel.py", line 409, in self_test
    model.train()
  File "SurrogateModel.py", line 350, in train
    search.fit(model.X_scaled, model.Y_scaled)
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/grid_search.py", line 804, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/grid_search.py", line 553, in _fit
    for parameters in parameter_iterable
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 800, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 658, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 566, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 180, in __init__
    self.results = batch()
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/svm/base.py", line 193, in fit
    fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/svm/base.py", line 251, in _dense_fit
    max_iter=self.max_iter, random_seed=random_seed)
  File "sklearn/svm/libsvm.pyx", line 59, in sklearn.svm.libsvm.fit (sklearn/svm/libsvm.c:1576)
  File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/pandas/core/series.py", line 78, in wrapper
    "{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>

堆栈跟踪对我来说有点混乱。 sklearn让它一直到#34; fit&#34;调用libsvm,但是然后Pandas包引发了一个TypeError,因为它无法将一个系列对象转换为一个浮点数?我查看了pandas series.py模块,并找到了引发TypeError的其他上下文:

# in pandas/core/series.py
def _coerce_method(converter):
""" install the scalar coercion methods """

def wrapper(self):
    if len(self) == 1:
        return converter(self.iloc[0])
    raise TypeError("cannot convert the series to "
                    "{0}".format(str(converter)))

return wrapper

当我传递给scikit-learn的数据结构都不是pandas对象时,我不明白如何调用这个Pandas函数。在我提供的代码中,它们都是ndarray,但我尝试将它们作为普通列表传递,以获得相同的TypeError。在代码中较早使用的Pandas 用于将数据从csv读取到DataFrame中,但数据在被放入X_scaled和Y_scaled之前会转换为ndarrays。

令人讨厌的是,这个几乎完全相同的代码在此脚本之前的脚本中运行得非常好。我遇到此问题的代码版本基本上是从脚本重构的,但是此功能在训练数据上训练网格搜索对象基本保持不变。

对于此处可能发生的事情的任何建议都非常感谢。谢谢!

0 个答案:

没有答案