在寻找将scikit-learn模型转换为PMML的方法时,我最近遇到了sklearn2pmml和jpmml-sklearn。但是,在尝试使用我无法弄清楚的基本用法示例时,我遇到了错误。
在sklearn2pmml中尝试使用示例时,我一直在接受以下问题:将long转换为int:
Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
at numpy.core.NDArrayUtil.getShape(NDArrayUtil.java:66)
at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:92)
at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:76)
at sklearn.linear_model.BaseLinearClassifier.getCoefShape(BaseLinearClassifier.java:144)
at sklearn.linear_model.BaseLinearClassifier.getNumberOfFeatures(BaseLinearClassifier.java:56)
at sklearn.Classifier.createSchema(Classifier.java:50)
at org.jpmml.sklearn.Main.run(Main.java:104)
at org.jpmml.sklearn.Main.main(Main.java:87)
Traceback (most recent call last):
File "C:\Users\user\workspace\sklearn_pmml\test.py", line 40, in <module>
sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
File "C:\Python27\lib\site-packages\sklearn2pmml\__init__.py", line 49, in sklearn2pmml
os.remove(dump)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\user\\appdata\\local\\temp\\tmpmxyp2y.pkl'
关于这里发生了什么的任何建议?
使用代码:
#
# Step 1: feature engineering
#
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas
import sklearn_pandas
iris = load_iris()
iris_df = pandas.concat((pandas.DataFrame(iris.data[:, :], columns = ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]), pandas.DataFrame(iris.target, columns = ["Species"])), axis = 1)
iris_mapper = sklearn_pandas.DataFrameMapper([
(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], PCA(n_components = 3)),
("Species", None)
])
iris = iris_mapper.fit_transform(iris_df)
#
# Step 2: training a logistic regression model
#
from sklearn.linear_model import LogisticRegressionCV
iris_X = iris[:, 0:3]
iris_y = iris[:, 3]
iris_classifier = LogisticRegressionCV()
iris_classifier.fit(iris_X, iris_y)
#
# Step 3: conversion to PMML
#
from sklearn2pmml import sklearn2pmml
sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
编辑12/6: 在新的更新之后,同样的问题出现在更远的地方:
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Updating 1 target field and 3 active field(s)
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Mapping target field y to Species
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Mapping active field(s) [x1, x2, x3] to [Sepal.Length, Sepal.Width, Petal.Length, Petal.Width]
Traceback (most recent call last):
File "C:\Users\user\workspace\sklearn_pmml\test.py", line 40, in <module>
sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
File "C:\Python27\lib\site-packages\sklearn2pmml\__init__.py", line 49, in sklearn2pmml
os.remove(dump)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\user\\appdata\\local\\temp\\tmpqeblat.pkl'
答案 0 :(得分:0)
JPMML-SkLearn期望ndarray.shape
是i4
的元组(由Pyrolite库映射到java.lang.Integer
)。但是,在这种情况下,它是i8
的元组(映射到java.lang.Long
)。因此,施放例外。
此问题已在JPMML-SkLearn commit f7c16ac2fb中解决。
如果您遇到另一个异常(平台之间的数据转换可能很棘手),那么您还应该打开一个关于它的JPMML-SkLearn问题。