我想创建一个PMML管道,例如
PMMLPipeline([
("clt", ColumnTransformer([
("cl", Pipeline([
("std", StandardScaler()),
("pca", PCA(4) ),
("kmeans", KMeans(5) )
]), clVars),
("id", "passthrough", idVars)
])
),
("et", ExpressionTransformer("X[0]+X[1]")
),
("lr", LinearRegression()
)
])
有人知道在sklearn.compose.ColumnTransformer中使用估算器是否真的可能?
使用以下设置:
System:
python: 3.7.1 (default, Dec 14 2018, 19:28:38) [GCC 7.3.0]
executable: /opt/anaconda3/envs/python_3.7.1_eb/bin/python
machine: Linux-4.14.114-83.126.amzn1.x86_64-x86_64-with-glibc2.10
BLAS:
macros: HAVE_CBLAS=None, NO_ATLAS_INFO=-1
lib_dirs: /usr/lib64/atlas
cblas_libs: cblas
Python deps:
pip: 19.1.1
setuptools: 41.0.1
sklearn: 0.21.2
numpy: 1.16.4
scipy: 1.3.0
Cython: None
pandas: 0.24.2
我分别安装了每个步骤(clt,et和lr)。
然后,PMMLPipeline创建失败,并显示:
java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.cluster.k_means_.KMeans)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:612)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:68)
at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:56)
at sklearn.compose.ColumnTransformer.encodeFeatures(ColumnTransformer.java:63)
at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:196)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
如果我尝试将适合的ColumnTransformer应用于大熊猫数据帧,则可以使用AttributeError: 'ColumnTransformer' object has no attribute 'predict'
来获取predict
或包含5列且不包含clVar或kmeans的结果的数组以及使用transform
匹配idVars的3列。因此,我对此表示怀疑:s
答案 0 :(得分:0)
此异常意味着SkLearn2PMML转换器在仅可以找到转换器对象(KMeans
的子类)的位置找到了一个估计器对象(TransformerMixin
)。
根据Scikit-Learn约定,管道应仅包含一个估计器对象,这是最后一步。现在,您有两个估算器对象(KMeans
和LinearRegression
)。
如果遵守上述约定,则转换应该成功。