熊猫排序不正确地排序数据

时间:2019-08-07 16:18:02

标签: python-3.x pandas sklearn-pandas

我正在尝试对sklearn.ensemble.RandomForestRegressor的{​​{1}}的结果进行排序

我具有以下功能:

feature_importances_

我这样使用它:

def get_feature_importances(cols, importances):
    feats = {}
    for feature, importance in zip(cols, importances):
        feats[feature] = importance

    importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
    importances.sort_values(by='Gini-importance')

    return importances

我得到以下结果:

importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)

我认为| PART | 0.035034 | | MONTH1 | 0.02507 | | YEAR1 | 0.020075 | | MONTH2 | 0.02321 | | YEAR2 | 0.017861 | | MONTH3 | 0.042606 | | YEAR3 | 0.028508 | | DAYS | 0.047603 | | MEDIANDIFF | 0.037696 | | F2 | 0.008783 | | F1 | 0.015764 | | F6 | 0.017933 | | F4 | 0.017511 | | F5 | 0.017799 | | SS22 | 0.010521 | | SS21 | 0.003896 | | SS19 | 0.003894 | | SS23 | 0.005249 | | SS20 | 0.005127 | | RR | 0.021626 | | HI_HOURS | 0.067584 | | OI_HOURS | 0.054369 | | MI_HOURS | 0.062121 | | PERFORMANCE_FACTOR | 0.033572 | | PERFORMANCE_INDEX | 0.073884 | | NUMPA | 0.022445 | | BUMPA | 0.024192 | | ELOH | 0.04386 | | FFX1 | 0.128367 | | FFX2 | 0.083839 | 行将对它们进行排序。但事实并非如此。为什么这不能正确执行?

1 个答案:

答案 0 :(得分:2)

importances.sort_values(by='Gini-importance')返回已排序的数据框,您的函数忽略了该数据框。

您想要return importances.sort_values(by='Gini-importance')

或者您可以就地设置sort_values

importances.sort_values(by='Gini-importance', inplace=True)

return importances