在执行超参数调整并为我的分类器获取最佳参数后,我试图从我的数据中获取特征重要性。我还为训练集拟合了我的最佳参数,现在我正在尝试获得重要的特征,但我不断收到错误,并尝试了我在互联网上找到的所有可能的解决方案。
在下面查看我的代码;
enter code here
# define models and parameters for hyperparametrs
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV
# define grid search
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_features': ['auto','sqrt'],
'n_estimators': [100,1000]
}
# Create a based model
rf = RandomForestClassifier()
# Instantiate the grid search model
grid_search = HalvingGridSearchCV(estimator = rf, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
cv = StratifiedKFold(n_splits=10, shuffle = True, random_state=42)
steps_3 = [('over', RandomOverSampler()), ('chi_square', SelectKBest(chi2, k=7000)), ('estimator', grid_search)]
pipeline_3 = Pipeline(steps=steps_3)
#fit the model
rf_hyperparameter = pipeline_3.fit(X_train, y_train)
print(rf_hyperparameter)
# print('Best parameter set: %s' % grid_search.best_params_)
print("Best Score:" + str(grid_search.best_score_))
print("Best Parameters: " + str(grid_search.best_params_))
best_parameters = grid_search.best_params_
#fit the best parameters to the training data
rf_best = RandomForestClassifier(bootstrap = True, max_features= 'auto', n_estimators = 1000)
rf_best.fit(X_train, y_train)
feature_importances = pd.DataFrame(rf_best.feature_importances_,
index=X_train.columns,columns=['importance']).sort_values('importance',ascending = False)
feature_importances
运行上面的代码后,这是我得到的错误
AttributeError Traceback (most recent call last)
<ipython-input-159-563c7c3e7fc5> in <module>
1 feature_importances = pd.DataFrame(rf_best.feature_importances_,
----> 2 index=X_train.columns,columns=['importance']).sort_values('importance',ascending = False)
3 feature_importances
AttributeError: 'numpy.ndarray' object has no attribute 'columns'
我会非常感谢我能得到的任何意见。谢谢!
答案 0 :(得分:0)
问题中缺少完成 train_test_split
的代码部分。 train_test_split
返回 numpy
数组而不是 pandas 数据帧,因此 X_train.columns
将失败。将 Pandas 数据帧本身的 df.columns
作为 list
并传入 index
应该可以工作。