是否可以将sklearn估计量放在sklearn.compose.ColumnTransformer中?

时间:2019-06-19 15:44:15

标签: python-3.x scikit-learn pmml

我想创建一个PMML管道,例如

PMMLPipeline([
    ("clt", ColumnTransformer([
                ("cl", Pipeline([
                           ("std",     StandardScaler()),
                           ("pca",     PCA(4)          ),
                           ("kmeans",  KMeans(5)       )
                       ]),            clVars),
                ("id", "passthrough", idVars)
            ])
    ),
    ("et",  ExpressionTransformer("X[0]+X[1]")
    ),
    ("lr",  LinearRegression()
    )
])

有人知道在sklearn.compose.ColumnTransformer中使用估算器是否真的可能?

使用以下设置:

System:
    python: 3.7.1 (default, Dec 14 2018, 19:28:38)  [GCC 7.3.0]
executable: /opt/anaconda3/envs/python_3.7.1_eb/bin/python
   machine: Linux-4.14.114-83.126.amzn1.x86_64-x86_64-with-glibc2.10

BLAS:
    macros: HAVE_CBLAS=None, NO_ATLAS_INFO=-1
  lib_dirs: /usr/lib64/atlas
cblas_libs: cblas

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.21.2
     numpy: 1.16.4
     scipy: 1.3.0
    Cython: None
    pandas: 0.24.2

我分别安装了每个步骤(clt,et和lr)。

然后,PMMLPipeline创建失败,并显示:

java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.cluster.k_means_.KMeans)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:612)
    at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
    at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:68)
    at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:56)
    at sklearn.compose.ColumnTransformer.encodeFeatures(ColumnTransformer.java:63)
    at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:196)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)

如果我尝试将适合的ColumnTransformer应用于大熊猫数据帧,则可以使用AttributeError: 'ColumnTransformer' object has no attribute 'predict'来获取predict 或包含5列且不包含clVar或kmeans的结果的数组以及使用transform匹配idVars的3列。因此,我对此表示怀疑:s

1 个答案:

答案 0 :(得分:0)

此异常意味着SkLearn2PMML转换器在仅可以找到转换器对象(KMeans的子类)的位置找到了一个估计器对象(TransformerMixin)。

根据Scikit-Learn约定,管道应仅包含一个估计器对象,这是最后一步。现在,您有两个估算器对象(KMeansLinearRegression)。

如果遵守上述约定,则转换应该成功。