我是数据科学领域的新手。我正在尝试找出我的数据集的特征重要性排名。我已经应用了随机森林并获得了输出。
这是我的代码:
Again:
mov ax, [si]
cmp al, 13 ; If 1st byte is 13, then next byte is just garbage!
je CR1 ; ... so no further interpretation needed
and al, 0Fh
inc si
cmp ah, 13 ; If 2nd byte is 13, then result is based on 1st byte
je CR2 ; ... and that kind-of zero-replacement
and ah, 0F0h
inc si
or al, ah
...
jmp Again
CR1:
xor al, al
CR2:
...
在重要性部分,我几乎复制了以下示例: https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html
代码如下:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# importing dataset
dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values
#encoding catagorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#country
labelencoder_X_1= LabelEncoder()
X[:,1]=labelencoder_X_1.fit_transform(X[:,1])
#gender
labelencoder_X_2= LabelEncoder()
X[:,2]=labelencoder_X_2.fit_transform(X[:,2])
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()
#spliting dataset into test set and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
我期望文档中显示输出。有人可以帮我吗?预先感谢。
答案 0 :(得分:3)
您有很多功能,无法在一个图中看到。 只需绘制其中一些。
在这里,我列出了最重要的前20个:
# Plot the feature importances of the forest
plt.figure(figsize=(18,9))
plt.title("Feature importances")
n=20
_ = plt.bar(range(n), importances[indices][:n], color="r", yerr=std[indices][:n])
plt.xticks(range(n), indices)
plt.xlim([-1, n])
plt.show()
我的代码,以备不时之需:https://filebin.net/be4h27swglqf3ci3
输出: