Crossvalidator()在管道中失败:“ awaitResult中抛出异常”

时间:2019-04-29 10:19:00

标签: scala apache-spark cross-validation

我正在尝试在Spark中运行一个简单的管道,但是在使用CrossValidator()时会引发awaitResult错误。 TrainValidationSplit()不会发生这种情况。

我尝试将CrossValidator()放置在管道的内部和外部,都返回相同的错误。

\\ Pipeline components
val discretizer = new QuantileDiscretizer()
  .setNumBuckets(7) 
  .setInputCol("rmse")
  .setOutputCol("actual_error_bucket")

val assmbleFeatures: VectorAssembler = new VectorAssembler()
  .setInputCols(featureColumns)
  .setOutputCol("features")

val randomForest = new RandomForestClassifier()
  .setLabelCol("actual_error_bucket")
  .setFeaturesCol("features")
  .setImpurity("entropy")
  .setSubsamplingRate(0.8)

// Pipeline
val pipeline = new Pipeline()
  .setStages(Array(discretizer, assmbleFeatures, randomForest)) 


val paramGrid = new ParamGridBuilder()
  .addGrid(randomForest.maxDepth, (10 to 20 by 10).toArray) 
  .addGrid(randomForest.numTrees, (100 to 130 by 10).toArray)
  .addGrid(randomForest.maxBins, (12 to 22 by 10).toArray)
  .build()

// throws error
val crossValidator = new CrossValidator()
  .setEstimator(pipeline)
  .setEstimatorParamMaps(paramGrid)
  .setEvaluator(new MulticlassClassificationEvaluator)
  .setNumFolds(2)

val Array(training, test) = df.randomSplit(Array(0.75, 0.25), seed = 12345)
val pipeLineModel = tvs.fit(training)

我期望有一个训练有素的cvmodel,就像在管道外完成时一样。

0 个答案:

没有答案