我使用pyspark运行回归问题。下面是我的代码
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import (BinaryClassificationEvaluator,MulticlassClassificationEvaluator)
from pyspark.ml import Pipeline
wine_lr = LogisticRegression(labelCol='label')
pipeline = Pipeline(stages=[wine_lr])
x1 = 'elasticNetParam'
x2 = 'regParam'
paramGrid = ParamGridBuilder() \
.addGrid(getattr(wine_lr,x1), [0.1, 0.01,0.3]) \
.addGrid(getattr(wine_lr,x2),[0.1,0.001,0.2])\
.build()
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=MulticlassClassificationEvaluator(),
numFolds=2)
cvModel = crossval.fit(train_data)
best_model = cvModel.bestModel
在网格搜索之后我得到了best_model。我想从best_model中了解最好的超参数。我尝试使用 _java_obj ,它会引发错误。
best_reg_param = best_model._java_obj.getRegParam()
AttributeError: 'PipelineModel' object has no attribute '_java_obj'
谁能告诉我如何从流水线模型中获取最佳超参数?