Spark代码关闭使用异步函数来并行化rdds处理

时间:2017-06-19 04:45:55

标签: apache-spark asynchronous

你好这里是一个简单的火花代码,在我的代码中,我试图使用异步函数并行运行2个rdds处理。但是当我使用异步功能运行我的代码时,火花代码关闭了说“无法调用方法”停止了火花背景“。我的系统是双核心的。

这是我的代码:

val conf = new SparkConf().setAppName("spark_auth").setMaster("local[*]").set("spark.scheduler.mode", "FAIR")
val sc = new SparkContext(conf)             
val rdd1 = sc.parallelize(List(32, 34, 2, 3, 4, 54, 3))
rdd1.foreachAsync{ x =>  println("Items in the list:"+x)}
val rdd2 = sc.parallelize(List("1"))
rdd2.foreachAsync{ y =>  println("y is:"+y)}

build.sbt:

name := "asyncTest"

version := "2.0.0"

scalaVersion := "2.11.8"


libraryDependencies ++= Seq(

    "org.apache.spark" % "spark-streaming_2.11" % "2.1.1"

  )

错误日志:

17/06/19 04:29:41 INFO TaskSchedulerImpl: Cancelling stage 0
17/06/19 04:29:41 INFO DAGScheduler: ResultStage 0 (foreachAsync at test.scala:13) failed in Unknown s due to Job aborted due to stage failure: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:

org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
test.test$.main(test.scala:10)
test.test.main(test.scala)

The currently active SparkContext was created at:

org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
test.test$.main(test.scala:10)
test.test.main(test.scala)

java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:

org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
test.test$.main(test.scala:10)
test.test.main(test.scala)

The currently active SparkContext was created at:

org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
test.test$.main(test.scala:10)
test.test.main(test.scala)

        at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:100)
        at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1407)
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:996)
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:918)
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:862)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1613)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

17/06/19 04:29:41 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@41103bce)
17/06/19 04:29:41 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1497846581465,JobFailed(org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.

要检查如何在spark中平行运行进程,请参阅https://blog.knoldus.com/2015/10/21/demystifying-asynchronous-actions-in-spark/。 任何人都可以指导我如何解决此错误?或者是否有更好的方法来并行化spark rdd处理? 提前谢谢。

0 个答案:

没有答案