通过初始建模,得分显示Logistic回归具有最佳准确性。我试图通过应用RFE()来改善结果,但是准确性结果却下降了。这会发生吗?
# To get interim modelling score
names = []
scores = []
for name, model in models:
model.fit(x_train, y_train.values.ravel())
y_pred = model.predict(x_test)
scores.append(accuracy_score(y_test, y_pred))
names.append(name)
# Comparison
models_comparison = pd.DataFrame({'Name':names, 'Score': scores})
models_comparison.sort_values(by='Score', ascending=False, inplace=True)
models_comparison
LR_ModelSelected = LogisticRegression()
LR_ModelSelected.fit(x,y.values.ravel())
# RFE
rfe = RFE(LR_ModelSelected, 15)
rfe = rfe.fit(x_test, y_test.values.ravel())
print(rfe.support_)
print(rfe.ranking_)
# Select columns recommended by RFE above
selcols= ['DailyRate', 'EnvironmentSatisfaction', 'Gender', 'JobSatisfaction', 'MaritalStatus',
'NumCompaniesWorked', 'OverTime', 'RelationshipSatisfaction','StockOptionLevel',
'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsSinceLastPromotion',
'YearsWithCurrManager']
x= x[selcols]
y= y
# Fit new data with reduced features
LR_ModelSelected.fit(x,y.values.ravel())
# Show Result
y_pred = LR_ModelSelected.predict(x_test)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(LR_ModelSelected.score(x_test, y_test)))
print('Accuracy of logistic regression classifier on train set: {:.2f}'.format(LR_ModelSelected.score(x_train, y_train)))
我原本希望准确性会提高,但从0.9降至0.89。