如何从pyspark中的流水线模型中获取最佳超参数?

时间:2018-10-11 11:30:41

标签: pyspark apache-spark-ml

我使用pyspark运行回归问题。下面是我的代码

from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import (BinaryClassificationEvaluator,MulticlassClassificationEvaluator)
from pyspark.ml import Pipeline
wine_lr = LogisticRegression(labelCol='label')
pipeline = Pipeline(stages=[wine_lr])
x1 = 'elasticNetParam'
x2 = 'regParam'
paramGrid = ParamGridBuilder() \
    .addGrid(getattr(wine_lr,x1), [0.1, 0.01,0.3]) \
    .addGrid(getattr(wine_lr,x2),[0.1,0.001,0.2])\
    .build()
crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          evaluator=MulticlassClassificationEvaluator(),
                          numFolds=2)
cvModel = crossval.fit(train_data)
best_model = cvModel.bestModel

在网格搜索之后我得到了best_model。我想从best_model中了解最好的超参数。我尝试使用 _java_obj ,它会引发错误。

best_reg_param = best_model._java_obj.getRegParam()
AttributeError: 'PipelineModel' object has no attribute '_java_obj'

谁能告诉我如何从流水线模型中获取最佳超参数?

0 个答案:

没有答案