sklearn2pmml保存机器学习模型错误

时间:2018-02-24 08:36:45

标签: python

我最近在寻找一种在python和java之间跨平台传输机器学习模型的解决方案。换句话说,首先用python训练机器学习模型,然后用java进行在线预测。幸运的是,我发现了sklearn2pmml。但是,我在尝试使用基本用法示例时遇到了java错误,错误导致保存的空文件。 代码如下:

from sklearn_pandas import DataFrameMapper    
import pandas as pd
import numpy as np
from sklearn2pmml import sklearn2pmml,PMMLPipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler,LabelBinarizer,FunctionTransformer
heart_data = pd.read_csv("heart.csv")
# 用Mapper定义特征工程
mapper = DataFrameMapper([
    (['sbp'], MinMaxScaler()),
    (['tobacco'], MinMaxScaler()),
    ('ldl', None),
    ('adiposity', None),
    (['famhist'], LabelBinarizer()),
    ('typea', None),
    ('obesity', None),
    ('alcohol', None),
    (['age'], FunctionTransformer(np.log)),
]) 
#用pipeline定义使用的模型,特征工程等
pipeline = PMMLPipeline([
   ('mapper', mapper),
   ("classifier", LinearRegression())
])

pipeline.fit(heart_data[heart_data.columns.difference(["chd"])], heart_data["chd"])   # 排除某些列,使用df.columns.difference(['列名'])
#导出模型文件
sklearn2pmml(pipeline, "lrHeart.xml", with_repr = True)

错误如下:

Standard output is empty
Standard error:
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Parsing PKL..
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Parsed PKL in 42 ms.
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Converting..
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Converted in 119 ms.
2月 24, 2018 3:55:48 下午 org.jpmml.sklearn.Main run
信息: Marshalling PMML..
Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/bind/JAXBContext
    at org.jpmml.model.JAXBUtil.getContext(JAXBUtil.java:126)
    at org.jpmml.model.MetroJAXBUtil.marshalPMML(MetroJAXBUtil.java:25)
    at org.jpmml.sklearn.Main.run(Main.java:159)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.JAXBContext
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:185)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
    ... 4 more

---------------------------------------------------------------------------
RuntimeError                          

        Traceback (most recent call last)
    <ipython-input-2-8e79360b3ddd> in <module>()


    20 pipeline.fit(heart_data[heart_data.columns.difference(["chd"])], heart_data["chd"])   # 排除某些列,使用df.columns.difference(['列名'])
         21 #导出模型文件
    ---> 22sklearn2pmml(pipeline, "lrHeart.xml", with_repr = True)

~/anaconda3/lib/python3.6/site-packages/sklearn2pmml/__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, debug)
    304                                 print("Standard error is empty")
    305                 if retcode:
--> 306                         raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
    307         finally:
    308                 if debug:

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

1 个答案:

答案 0 :(得分:0)

您使用的是Java(.exe)版本9吗?如果是这样,那么您应该(至少暂时)降级到Java版本8,因为尚未为新版本正确配置sklearn2pmml包。见https://github.com/jpmml/sklearn2pmml/issues/80