使用scikit-learn时的MemoryError

时间:2016-03-27 08:06:05

标签: python pandas machine-learning scikit-learn

我的数据集有超过200,000行和337列。我在第六列中使用其中五列进行预测。当我尝试使用以下方法拟合/训练我的算法时:

predictors = [
    'RO5', 'RO3', 'RO6',
    'CS5', 'income'
]
selector = SelectKBest(f_classif, k=5)
selector.fit(data[predictors], data['ED2'])

scores = -np.log10(selector.pvalues_)

alg = RandomForestClassifier(
    random_state=1,
    n_estimators = 20,
    min_samples_split = 8,
    min_samples_leaf = 3
)
scores = cross_validation.cross_val_score(alg, data[predictors], data["ED2"], cv=3)
print(scores.mean())

RandomForestClassifierSelectKBest都出现内存错误。如何解决此问题以消除错误并训练我的算法?

COMPLETE TRACEBACK

Traceback (most recent call last):
  File "F:\major\solution-1.py", line 234, in <module>
    prep_data()
  File "F:\major\solution-1.py", line 160, in prep_data
    selector.fit(data[predictors], data['ED2'])
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1963, in __getitem__
    return self._getitem_array(key)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2008, in _getitem_array
    return self.take(indexer, axis=1, convert=True)
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1368, in take
    self._consolidate_inplace()
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2411, in _consolidate_inplace
    self._protect_consolidate(f)
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2402, in _protect_consolidate
    result = f()
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2410, in f
    self._data = self._data.consolidate()
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3194, in consolidate
    bm._consolidate_inplace()
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3199, in _consolidate_inplace
    self.blocks = tuple(_consolidate(self.blocks))
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 4189, in _consolidate
    _can_consolidate=_can_consolidate)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 4212, in _merge_blocks
    new_values = new_values[argsort]
MemoryError

0 个答案:

没有答案