我想在pyspark中进行一种sklearn交叉验证,而无需使用 ParamGrid Builder 。
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.regression import LinearRegression
lr = LinearRegression(regParam=0.1,elasticNet=0.2,maxIter=100)
crossval = CrossValidator(estimator=lr,
evaluator=BinaryClassificationEvaluator(),
numFolds=2)
是否可以在不使用paramGrid Builder的情况下以这种方式执行交叉验证?我的用例是我想将参数作为参数而不作为paramGrid对象传递给线性回归类。
答案 0 :(得分:0)
一种简单的解决方案是仅提供您要在ParamGrid中使用的参数:
paramGrid = ParamGridBuilder() \
.addGrid(lr.regParam, [0.1]) \
.addGrid(lr.elasticNet, [0.2]) \
.addGrid(lr.maxIter, [100])
.build()
crossval = CrossValidator(estimator=lr,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=2)
您始终可以编写自己的K-fold版本,使用:将数据集分成K个部分:
fold1, fold2 = df.randomSplit([0.5,0.5])
folds = [fold1, fold2]
res = []
for fold in folds:
train, test = fold.randomSplit([0.80,0.20])
model.train(train)
res.append(model.evaluate(test))
do_what_you_want(res)