sklearn 管道索引不匹配

时间:2021-06-24 13:56:53

标签: python scikit-learn pipeline

我用 sklearn 管道构建了一个预处理管道,如下所示:

model_pipeline = Pipeline(steps=[('pre processing categorical', pre_process_categorical),
                                  ('standardizing scale', StandardScaler()),
                                  ('K feature selector', SelectKBest()),
                                  ('forward feature selection', RFECV())])

我想查看转换后保留的列名 我查看了每个阶段的索引并找到了以下结果:

model_pipeline.steps[-2][1].get_support(indices = True)

Out[50]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21, 22, 27, 28, 31, 32, 33, 34, 36, 37, 40, 43, 44, 46, 47, 49, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 68, 69, 70, 71, 74, 76, 80, 81, 82, 83, 84, 85, 89, 90, 91, 92, 98, 102, 103, 109, 119, 121, 123, 125, 130, 136, 138, 146, 151, 152, 153, 157, 158, 162, 163, 164, 165, 167, 170, 171, 172, 173, 176, 183, 185, 186, 192, 194, 195, 199, 203, 206, 208, 215, 216, 220, 223, 232, 249, 252, 253, 254, 255, 257, 258, 259, 260, 261, 262, 263, 264, 265, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 282, 283, 285, 286, 288, 290, 292, 294, 295, 297, 298, 299、300、304、306、308、312、315])

model_pipeline.steps[-1][1].get_support(indices = True)

Out[51]: array([ 2, 5, 18, 22, 33, 36, 38, 43, 114, 122, 125, 127, 137, 142, 143, 144, 145, 146, 148, 149])

我无法理解某些索引(例如:38)在最后一步中如何存在,但在倒数第二个中丢失了??

所以两个问题:

  1. 从一个阶段传递到另一个阶段时是否保留了索引?
  2. 获取管道后保留的列名的更简单方法是什么?

0 个答案:

没有答案