我在堆叠和混合模型读取时遇到错误:IndexError:索引超出范围。 如果我得到关于此的指导将会有所帮助。感谢...
我读了数据集:
import pandas as pd
import numpy as np
from stacked_generalizer import StackedGeneralizer
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression
#Load cleaned data :
train = pd.read_csv('train1.csv')
test = pd.read_csv('test1.csv')
然后我选择了变量。它是列车数据中所有变量的子集。
target='Y1'
ID = 'ID'
predictors1= ['Marks_SA','Marks_PA',
'Marks_CA','Feat2','Experience', 'Feat6','Feat1',
'Feat5','Feat4']
现在混合模型:
base_models = [RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='gini'),
RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='entropy'),
ExtraTreesClassifier(n_estimators=100, n_jobs=-1, criterion='gini')]
# define blending model
blending_model = LogisticRegression()
VERBOSE = True
N_FOLDS = 5
# initialize multi-stage model
sg = StackedGeneralizer(base_models, blending_model,
n_folds=N_FOLDS, verbose=VERBOSE)
# fit model
sg.fit(train[predictors1],train[target])
收到以下错误:
Fitting Base Models...
Fitting model 01: RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
Fold 1
IndexError Traceback (most recent call last)
<ipython-input-47-dd6152e11339> in <module>()
1 # fit model
2 #sg.fit(X[:n_train],y[:n_train])
----> 3 sg.fit(train[columns],train[target])
c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit(self, X, y)
211
212 def fit(self, X, y):
--> 213 X_blend = self.fit_transform_base_models(X, y)
214 self.fit_blending_model(X_blend, y)
215
c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_transform_base_models(self, X, y)
159
160 def fit_transform_base_models(self, X, y):
--> 161 self.fit_base_models(X, y)
162 return self.transform_base_models(X)
163
c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_base_models(self, X, y)
129 print('Fold %d' % (j + 1))
130
--> 131 X_train = X[train_idx]
132 y_train = y[train_idx]
133
C:\Users\Anaconda2\envs\gl-env\lib\site- packages\pandas\core\frame.pyc in __ getitem__(self, key)
1984 if isinstance(key, (Series, np.ndarray, Index, list)):
1985 # either boolean or fancy integer index
-> 1986 return self._getitem_array(key)
1987 elif isinstance(key, DataFrame):
1988 return self._getitem_frame(key)
C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
2029 else:
2030 indexer = self.ix._convert_to_indexer(key, axis=1)
-> 2031 return self.take(indexer, axis=1, convert=True)
2032
2033 def _getitem_multilevel(self, key):
C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\generic.pyc in take(self, indices, axis, convert, is_copy)
1626 new_data = self._data.take(indices,
1627 axis=self._get_block_manager_axis(axis),
-> 1628 convert=True, verify=True)
1629 result = self._constructor(new_data).__finalize__(self)
1630
C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\internals.pyc in take(self, indexer, axis, verify, convert)
3635 n = self.shape[axis]
3636 if convert:
-> 3637 indexer = maybe_convert_indices(indexer, n)
3638
3639 if verify:
C:\Usersnaconda2\envs\gl-env\lib\site-packages\pandas\core\indexing.pyc in maybe_convert_indices(indices, n)
1808 mask = (indices >= n) | (indices < 0)
1809 if mask.any():
-> 1810 raise IndexError("indices are out-of-bounds")
1811 return indices
1812
IndexError: indices are out-of-bounds
答案 0 :(得分:1)
只需更改此行:
sg.fit(train[predictors1],train[target])
并成功:
sg.fit(train[predictors1].values,train[target].values)
stacked_generalizer fit函数将ndarray作为输入。