问题
foreachAsync
和foreach
之间的区别吗? foreachAsync
是否并行工作? 我在java中的代码示例
rdds.foreach(new VoidFunction<>()){//some actions}; //it works
rdds.foreachAsync(new VoidFunction<>()){//some actions}; //it fails
错误日志
17/11/07 16:42:38 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
17/11/07 16:42:40 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
17/11/07 16:42:43 WARN LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobStart(0,1510044163831,WrappedArray(org.apache.spark.scheduler.StageInfo@122f2c22, org.apache.spark.scheduler.StageInfo@c15154c),{spark.rdd.scope.noOverride=true, spark.rdd.scope={"id":"3","name":"foreachAsync"}})
17/11/07 16:42:43 WARN LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageSubmitted(org.apache.spark.scheduler.StageInfo@4b276b68,{spark.rdd.scope.noOverride=true, spark.rdd.scope={"id":"3","name":"foreachAsync"}})
17/11/07 16:42:43 WARN LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@4b276b68)
17/11/07 16:42:43 WARN LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1510044163921,JobFailed(org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:105)
org.apache.spark.SparkContext.broadcast(SparkContext.scala:1347)
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:873)
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:774)
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:777)
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:776)
scala.collection.immutable.List.foreach(List.scala:318)
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:776)
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:759)
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1508)
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1500)
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1487)
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:72)