'numpy.ndarray'对象没有属性'columns'

时间:2019-06-21 12:26:06

标签: python pandas scikit-learn random-forest

我正在尝试找出随机森林分类任务的功能重要性。但这给了我以下错误:

  

'numpy.ndarray'对象没有属性'columns'

这是我的代码的一部分:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# importing dataset

dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values

#spliting dataset into test set and train set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)  
regressor.fit(X_train, y_train) 

#feature importance

feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)


我希望这应该为我的数据集的每一列提供要素重要性得分。 (注意:原始数据以CSV格式保存)

3 个答案:

答案 0 :(得分:0)

因此,从X_train出来的train_test_split实际上是一个numpy数组,它将永远不会有列。 其次,当您从X生成dataset时返回值numpy.ndarry而不是df时要求输入值。

您需要更改行

feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)

columns_ = dataset.iloc[:1, 3:12].columns

feature_importances = pd.DataFrame(rf.feature_importances_,index = columns_,columns=['importance']).sort_values('importance',ascending=False)

答案 1 :(得分:0)

使用此:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# importing dataset

dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values

#spliting dataset into test set and train set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)  
regressor.fit(X_train, y_train) 

#feature importance

feature_importances = pd.DataFrame(regressor.feature_importances_,index = dataset.columns,columns=['importance']).sort_values('importance',ascending=False)


答案 2 :(得分:0)

iloc和loc函数只能应用于熊猫数据框。您正在将它们应用到数组中。 解: 将数组转换为数据框,然后应用iloc或loc