交叉验证随机森林以选择重要特征

时间:2020-05-18 01:17:09

标签: python machine-learning scikit-learn random-forest

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
df = pd.DataFrame(data['data'], columns=data['feature_names'])
df['target'] = data['target']
X = df.drop(columns=['target'])
y = df['target']
clf = RandomForestClassifier(n_estimators = 50, max_depth = 4)

scores = []
print(len(X.columns))
num_features = len(X.columns)
for i in range(num_features):
    col = X.columns[i]
    score = np.mean(cross_val_score(clf, X[col].values.reshape(-1,1), y, cv=10))
    scores.append((int(score*100), col))

print(sorted(scores, reverse = True))

我打算执行10倍交叉验证以选择最重要的功能。我对自己的方法感到困惑。看来不对!另外,我该如何绘制那些最重要的功能。感谢您的建议!

0 个答案:

没有答案