使用简化的特征集进行随机森林分类

时间:2020-06-27 17:59:43

标签: classification random-forest

我正在使用随机森林进行特征缩减。选择缩小的特征子集后,我想使用任何分类器对其进行分类。我需要使用简化的功能集,其中未选择的功能将被豁免。

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Feat_Emd.csv') 
X = dataset.iloc[:,:-1].values # 90 columns of features
y = dataset.iloc[:,-1].values

from sklearn.ensemble.forest import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel

# Spliliting the dataset into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, train_size = 0.8, random_state = 0)

# Building the classifier
sel = SelectFromModel(RandomForestClassifier(n_estimators = 100))
sel.fit(X_train, y_train)

sel.get_support()

# Reduced feature subset
selected_feat= dataset.iloc[:,:-1].columns[(sel.get_support())]
len(selected_feat)

现在,我只想从selected_feat中获取选定要素的列(约27个)以生成X_train和X_test。我该如何自动化?

0 个答案:

没有答案