适合Stacked Generalization时的IndexError

时间:2016-09-06 09:30:16

标签: python anaconda

我在堆叠和混合模型读取时遇到错误:IndexError:索引超出范围。 如果我得到关于此的指导将会有所帮助。感谢...

我读了数据集:

import pandas as pd
import numpy as np
from stacked_generalizer import StackedGeneralizer
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression

#Load cleaned data :
train = pd.read_csv('train1.csv')
test = pd.read_csv('test1.csv')

然后我选择了变量。它是列车数据中所有变量的子集。

 target='Y1'
 ID = 'ID'
 predictors1= ['Marks_SA','Marks_PA',
         'Marks_CA','Feat2','Experience', 'Feat6','Feat1',
         'Feat5','Feat4']

现在混合模型:

  base_models = [RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='gini'),
           RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='entropy'),
           ExtraTreesClassifier(n_estimators=100, n_jobs=-1, criterion='gini')]


 # define blending model
 blending_model = LogisticRegression()
 VERBOSE = True
 N_FOLDS = 5

 # initialize multi-stage model
 sg = StackedGeneralizer(base_models, blending_model, 
                    n_folds=N_FOLDS, verbose=VERBOSE)

# fit model
sg.fit(train[predictors1],train[target])

收到以下错误:

Fitting Base Models...
Fitting model 01: RandomForestClassifier(bootstrap=True, class_weight=None,     criterion='gini',
        max_depth=None, max_features='auto', max_leaf_nodes=None,
        min_samples_leaf=1, min_samples_split=2,
        min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=-1,
        oob_score=False, random_state=None, verbose=0,
        warm_start=False)

Fold 1

 IndexError                                Traceback (most recent call last)
 <ipython-input-47-dd6152e11339> in <module>()
  1 # fit model
  2 #sg.fit(X[:n_train],y[:n_train])
   ----> 3 sg.fit(train[columns],train[target])

 c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit(self, X, y)
211 
212         def fit(self, X, y):
--> 213                 X_blend = self.fit_transform_base_models(X, y)
214                 self.fit_blending_model(X_blend, y)
 215 

c:\users\src\stacked-generalization\stacked_generalizer.pyc in       fit_transform_base_models(self, X, y)
159 
160         def fit_transform_base_models(self, X, y):
--> 161                 self.fit_base_models(X, y)
162                 return self.transform_base_models(X)
163 

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_base_models(self, X, y)
129                                         print('Fold %d' % (j + 1))
130 
--> 131                                 X_train = X[train_idx]
132                                 y_train = y[train_idx]
133 

 C:\Users\Anaconda2\envs\gl-env\lib\site- packages\pandas\core\frame.pyc in     __    getitem__(self, key)
 1984         if isinstance(key, (Series, np.ndarray, Index, list)):
1985             # either boolean or fancy integer index
 -> 1986             return self._getitem_array(key)
 1987         elif isinstance(key, DataFrame):
 1988             return self._getitem_frame(key)

  C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
  2029         else:
  2030             indexer = self.ix._convert_to_indexer(key, axis=1)
  -> 2031             return self.take(indexer, axis=1, convert=True)
  2032 
  2033     def _getitem_multilevel(self, key):

   C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\generic.pyc  in  take(self, indices, axis, convert, is_copy)
   1626         new_data = self._data.take(indices,
   1627                                     axis=self._get_block_manager_axis(axis),
 -> 1628                                    convert=True, verify=True)
 1629         result = self._constructor(new_data).__finalize__(self)
  1630 

   C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\internals.pyc in  take(self, indexer, axis, verify, convert)
  3635         n = self.shape[axis]
 3636         if convert:
 -> 3637             indexer = maybe_convert_indices(indexer, n)
 3638 
 3639         if verify:

  C:\Usersnaconda2\envs\gl-env\lib\site-packages\pandas\core\indexing.pyc in maybe_convert_indices(indices, n)
  1808     mask = (indices >= n) | (indices < 0)
  1809     if mask.any():
 -> 1810         raise IndexError("indices are out-of-bounds")
 1811     return indices
 1812 

 IndexError: indices are out-of-bounds

1 个答案:

答案 0 :(得分:1)

只需更改此行:

sg.fit(train[predictors1],train[target])

并成功:

sg.fit(train[predictors1].values,train[target].values)

stacked_generalizer fit函数将ndarray作为输入。