将KNN最佳估计从GridSearchCV导出到PMML

时间:2018-04-19 05:27:48

标签: python scikit-learn knn grid-search pmml

我尝试将 KNN 模型保存到anaconda中的 PMML 。但它没有用。

我的剧本:

#### load iris dataset
iris_dt = pd.read_csv('iris.csv' , header = 0)
#### Create development and evaluation samples
X_train_dev, X_test, y_train_dev, y_test =  train_test_split(iris_dt.ix[:, 0:4],
                                                         iris_dt['Species'],
                                                test_size=0.05,
                                                random_state=36851235,
                                                stratify=iris_dt['Species'])
#### Train KNNClassifier
# tune CV
crossv = StratifiedKFold(n_splits=10, random_state=36851234)
# tune GridSearchCV parameters
param_grid = {'n_neighbors': np.arange(1, 30)}

knn = KNeighborsClassifier()
knn_randomcv = RandomizedSearchCV(knn,
                              param_grid ,
                              n_iter = 15,
                              scoring = 'f1_weighted',
                              cv = crossv,
                              random_state=36851232)
knn_randomcv = knn_randomcv.fit(X_train_dev, y_train_dev)  

# choose best estimator
knn_best_random = knn_randomcv.best_estimator_

#### Save best estimator like pmml
pipeline = PMMLPipeline([("knn_best_estimator",knn_randomcv.best_estimator_)])

pipeline.active_fields = X_train_dev.columns.values
pipeline.target_field = y_train_dev.name

sklearn2pmml(pipeline, "KNNFit_py.pmml", debug = 'True') 

我的调试日志:

  • python:2.7.14
  • sklearn:0.19.1
  • sklearn.externals.joblib:0.11
  • pandas:0.20.3
  • sklearn_pandas:1.6.0
  • sklearn2pmml:0.35.0

当我尝试启动java转换器时,我得到更详细的错误:

SEVERE: Failed to convert
java.lang.ClassCastException: numpy.core.Scalar cannot be cast to java.lang.Number
    at sklearn.neighbors.KNeighborsClassifier.getNumberOfNeighbors(KNeighborsClassifier.java:70)
    at sklearn.neighbors.KNeighborsUtil.encodeNeighbors(KNeighborsUtil.java:130)
    at sklearn.neighbors.KNeighborsClassifier.encodeModel(KNeighborsClassifier.java:57)
    at sklearn.neighbors.KNeighborsClassifier.encodeModel(KNeighborsClassifier.java:32)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:161)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)

Exception in thread "main" java.lang.ClassCastException: numpy.core.Scalar cannot be cast to java.lang.Number
    at sklearn.neighbors.KNeighborsClassifier.getNumberOfNeighbors(KNeighborsClassifier.java:70)
    at sklearn.neighbors.KNeighborsUtil.encodeNeighbors(KNeighborsUtil.java:130)
    at sklearn.neighbors.KNeighborsClassifier.encodeModel(KNeighborsClassifier.java:57)
    at sklearn.neighbors.KNeighborsClassifier.encodeModel(KNeighborsClassifier.java:32)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:161)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)

请帮助。

1 个答案:

答案 0 :(得分:1)

根据文件:

n_neighbors : int, optional (default = 5)

    Number of neighbors to use by default for kneighbors queries.

n_neighbors应该是一个简单的int

执行np.arange(1, 30)时,它会返回numpy.int64,而不是内置int的python。 Sklearn-jpmml无法处理numpy.int64代替int我认为错误:

numpy.core.Scalar cannot be cast to java.lang.Number

更改为:

param_grid = {'n_neighbors': range(1, 30)}

并且错误将消失。

编辑:发布了一个github issue on the problem here