在Spark-ML中交叉验证失败

时间:2019-01-23 16:21:56

标签: multithreading scala apache-spark cross-validation

我执行了带有决策树和内部交叉验证的Spark-ML。

在交叉验证期间,此堆栈跟踪由于未知原因而失败:

  

org.apache.spark.util.ThreadUtils $ .awaitResult(ThreadUtils.scala:205)         org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4 $$ anonfun $ 6.apply(CrossValidator.scala:164)         org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4 $$ anonfun $ 6.apply(CrossValidator.scala:164)         scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)         scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)         scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)         scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)         scala.collection.TraversableLike $ class.map(TraversableLike.scala:234)         scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186)         org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4.apply(CrossValidator.scala:164)         org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4.apply(CrossValidator.scala:144)         scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)         scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)         scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)         scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)         scala.collection.TraversableLike $ class.map(TraversableLike.scala:234)         scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186)         org.apache.spark.ml.tuning.CrossValidator.fit(C​​rossValidator.scala:144)         DecisionTree.DecisionTreeDisplay.process(DecisionTreeDisplay.scala:151)

其后是一些线程堆栈跟踪:

  

2019-01-23 16:26:21错误TaskSchedulerImpl:91-异常   statusUpdate java.util.concurrent.RejectedExecutionException:任务   org.apache.spark.scheduler.TaskResultGetter$$anon$3@764726a7被拒绝   来自java.util.concurrent.ThreadPoolExecutor@783b07b9 [关闭,   池大小= 2,活动线程= 2,排队的任务= 0,已完成的任务=   4914] at   java.util.concurrent.ThreadPoolExecutor $ AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)     在   java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)     在   java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)     在   org.apache.spark.scheduler.TaskResultGetter.enqueueSuccessfulTask​​(TaskResultGetter.scala:61)     在   org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2 $ 1(TaskSchedulerImpl.scala:413)     在   org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:394)     在   org.apache.spark.scheduler.local.LocalEndpoint $$ anonfun $ receive $ 1.applyOrElse(LocalSchedulerBackend.scala:67)     在   org.apache.spark.rpc.netty.Inbox $$ anonfun $ process $ 1.apply $ mcV $ sp(Inbox.scala:117)     在org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)处   org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)位于   org.apache.spark.rpc.netty.Dispatcher $ MessageLoop.run(Dispatcher.scala:221)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)     在java.lang.Thread.run(Thread.java:748)

我的交叉验证代码是:

 // define Cross-Validation
    val cv = new CrossValidator()
      .setEstimator(pipeline)
      .setEvaluator(evaluator)
      .setEstimatorParamMaps(paramGrid)
      .setNumFolds(3)
      .setSeed(seed)
      .setCollectSubModels(true) // requires version of spark >= 2.3.0
      .setParallelism(8) // requires version of spark >= 2.3.0

    val cvModel = cv.fit(trainInfile) //Fail here

在ML库中,它似乎在以下行失败:

      val foldMetrics = foldMetricFutures.map(ThreadUtils.awaitResult(_, Duration.Inf))

有什么主意吗?

0 个答案:

没有答案