Question

我用 sklearn 管道构建了一个预处理管道，如下所示：

model_pipeline = Pipeline(steps=[('pre processing categorical', pre_process_categorical),
                                  ('standardizing scale', StandardScaler()),
                                  ('K feature selector', SelectKBest()),
                                  ('forward feature selection', RFECV())])

我想查看转换后保留的列名我查看了每个阶段的索引并找到了以下结果：

model_pipeline.steps[-2][1].get_support(indices = True)

Out[50]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21, 22, 27, 28, 31, 32, 33, 34, 36, 37, 40, 43, 44, 46, 47, 49, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 68, 69, 70, 71, 74, 76, 80, 81, 82, 83, 84, 85, 89, 90, 91, 92, 98, 102, 103, 109, 119, 121, 123, 125, 130, 136, 138, 146, 151, 152, 153, 157, 158, 162, 163, 164, 165, 167, 170, 171, 172, 173, 176, 183, 185, 186, 192, 194, 195, 199, 203, 206, 208, 215, 216, 220, 223, 232, 249, 252, 253, 254, 255, 257, 258, 259, 260, 261, 262, 263, 264, 265, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 282, 283, 285, 286, 288, 290, 292, 294, 295, 297, 298, 299、300、304、306、308、312、315])

model_pipeline.steps[-1][1].get_support(indices = True)

Out[51]: array([ 2, 5, 18, 22, 33, 36, 38, 43, 114, 122, 125, 127, 137, 142, 143, 144, 145, 146, 148, 149])

我无法理解某些索引（例如：38）在最后一步中如何存在，但在倒数第二个中丢失了？？

所以两个问题：

从一个阶段传递到另一个阶段时是否保留了索引？
获取管道后保留的列名的更简单方法是什么？

sklearn 管道索引不匹配

0 个答案: