我最近在寻找一种在python和java之间跨平台传输机器学习模型的解决方案。换句话说,首先用python训练机器学习模型,然后用java进行在线预测。幸运的是,我发现了sklearn2pmml。但是,我在尝试使用基本用法示例时遇到了java错误,错误导致保存的空文件。 代码如下:
from sklearn_pandas import DataFrameMapper
import pandas as pd
import numpy as np
from sklearn2pmml import sklearn2pmml,PMMLPipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler,LabelBinarizer,FunctionTransformer
heart_data = pd.read_csv("heart.csv")
# 用Mapper定义特征工程
mapper = DataFrameMapper([
(['sbp'], MinMaxScaler()),
(['tobacco'], MinMaxScaler()),
('ldl', None),
('adiposity', None),
(['famhist'], LabelBinarizer()),
('typea', None),
('obesity', None),
('alcohol', None),
(['age'], FunctionTransformer(np.log)),
])
#用pipeline定义使用的模型,特征工程等
pipeline = PMMLPipeline([
('mapper', mapper),
("classifier", LinearRegression())
])
pipeline.fit(heart_data[heart_data.columns.difference(["chd"])], heart_data["chd"]) # 排除某些列,使用df.columns.difference(['列名'])
#导出模型文件
sklearn2pmml(pipeline, "lrHeart.xml", with_repr = True)
错误如下:
Standard output is empty
Standard error:
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Parsing PKL..
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Parsed PKL in 42 ms.
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Converting..
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Converted in 119 ms.
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Marshalling PMML..
Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/bind/JAXBContext
at org.jpmml.model.JAXBUtil.getContext(JAXBUtil.java:126)
at org.jpmml.model.MetroJAXBUtil.marshalPMML(MetroJAXBUtil.java:25)
at org.jpmml.sklearn.Main.run(Main.java:159)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.JAXBContext
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:185)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
... 4 more
---------------------------------------------------------------------------
RuntimeError
Traceback (most recent call last)
<ipython-input-2-8e79360b3ddd> in <module>()
20 pipeline.fit(heart_data[heart_data.columns.difference(["chd"])], heart_data["chd"]) # 排除某些列,使用df.columns.difference(['列名'])
21 #导出模型文件
---> 22sklearn2pmml(pipeline, "lrHeart.xml", with_repr = True)
~/anaconda3/lib/python3.6/site-packages/sklearn2pmml/__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, debug)
304 print("Standard error is empty")
305 if retcode:
--> 306 raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
307 finally:
308 if debug:
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams
答案 0 :(得分:0)
您使用的是Java(.exe)版本9吗?如果是这样,那么您应该(至少暂时)降级到Java版本8,因为尚未为新版本正确配置sklearn2pmml
包。见https://github.com/jpmml/sklearn2pmml/issues/80