我正在尝试在Spark中运行一个简单的管道,但是在使用CrossValidator()时会引发awaitResult错误。 TrainValidationSplit()不会发生这种情况。
我尝试将CrossValidator()放置在管道的内部和外部,都返回相同的错误。
\\ Pipeline components
val discretizer = new QuantileDiscretizer()
.setNumBuckets(7)
.setInputCol("rmse")
.setOutputCol("actual_error_bucket")
val assmbleFeatures: VectorAssembler = new VectorAssembler()
.setInputCols(featureColumns)
.setOutputCol("features")
val randomForest = new RandomForestClassifier()
.setLabelCol("actual_error_bucket")
.setFeaturesCol("features")
.setImpurity("entropy")
.setSubsamplingRate(0.8)
// Pipeline
val pipeline = new Pipeline()
.setStages(Array(discretizer, assmbleFeatures, randomForest))
val paramGrid = new ParamGridBuilder()
.addGrid(randomForest.maxDepth, (10 to 20 by 10).toArray)
.addGrid(randomForest.numTrees, (100 to 130 by 10).toArray)
.addGrid(randomForest.maxBins, (12 to 22 by 10).toArray)
.build()
// throws error
val crossValidator = new CrossValidator()
.setEstimator(pipeline)
.setEstimatorParamMaps(paramGrid)
.setEvaluator(new MulticlassClassificationEvaluator)
.setNumFolds(2)
val Array(training, test) = df.randomSplit(Array(0.75, 0.25), seed = 12345)
val pipeLineModel = tvs.fit(training)
我期望有一个训练有素的cvmodel,就像在管道外完成时一样。