我执行了带有决策树和内部交叉验证的Spark-ML。
在交叉验证期间,此堆栈跟踪由于未知原因而失败:
org.apache.spark.util.ThreadUtils $ .awaitResult(ThreadUtils.scala:205) org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4 $$ anonfun $ 6.apply(CrossValidator.scala:164) org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4 $$ anonfun $ 6.apply(CrossValidator.scala:164) scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186) scala.collection.TraversableLike $ class.map(TraversableLike.scala:234) scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186) org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4.apply(CrossValidator.scala:164) org.apache.spark.ml.tuning.CrossValidator $$ anonfun $ 4.apply(CrossValidator.scala:144) scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186) scala.collection.TraversableLike $ class.map(TraversableLike.scala:234) scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186) org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:144) DecisionTree.DecisionTreeDisplay.process(DecisionTreeDisplay.scala:151)
其后是一些线程堆栈跟踪:
2019-01-23 16:26:21错误TaskSchedulerImpl:91-异常 statusUpdate java.util.concurrent.RejectedExecutionException:任务 org.apache.spark.scheduler.TaskResultGetter$$anon$3@764726a7被拒绝 来自java.util.concurrent.ThreadPoolExecutor@783b07b9 [关闭, 池大小= 2,活动线程= 2,排队的任务= 0,已完成的任务= 4914] at java.util.concurrent.ThreadPoolExecutor $ AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) 在 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) 在 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) 在 org.apache.spark.scheduler.TaskResultGetter.enqueueSuccessfulTask(TaskResultGetter.scala:61) 在 org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2 $ 1(TaskSchedulerImpl.scala:413) 在 org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:394) 在 org.apache.spark.scheduler.local.LocalEndpoint $$ anonfun $ receive $ 1.applyOrElse(LocalSchedulerBackend.scala:67) 在 org.apache.spark.rpc.netty.Inbox $$ anonfun $ process $ 1.apply $ mcV $ sp(Inbox.scala:117) 在org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)处 org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)位于 org.apache.spark.rpc.netty.Dispatcher $ MessageLoop.run(Dispatcher.scala:221) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 在 java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) 在java.lang.Thread.run(Thread.java:748)
我的交叉验证代码是:
// define Cross-Validation
val cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(3)
.setSeed(seed)
.setCollectSubModels(true) // requires version of spark >= 2.3.0
.setParallelism(8) // requires version of spark >= 2.3.0
val cvModel = cv.fit(trainInfile) //Fail here
在ML库中,它似乎在以下行失败:
val foldMetrics = foldMetricFutures.map(ThreadUtils.awaitResult(_, Duration.Inf))
有什么主意吗?