我基于StatefulNetworkWordCount.scala编写了一个有关“ RecoverableStatefulNetworkWordCount”的火花流作业!和RecoverableNetworkWordCount.scala!
驱动程序重新启动时报告异常: “(2)当Spark Streaming作业从检查点恢复时,如果在DStream操作中使用对未由Streaming作业定义的RDD的引用,则会发生此异常。有关更多信息,请参阅SPARK-13758。
如果我删除.initialState(initialRDD)
,它将正常运行并重新启动。
import org.apache.spark._
import org.apache.spark.streaming._
object SimpleApp {
def main(args: Array[String]) {
val checkpointDir = if(args.length >= 1) "hdfs://10.2.35.117:9000/spark-cp" else "./spark-cp"
def functionToCreateContext(): StreamingContext = {
val conf = new SparkConf().setMaster(if(args.length >= 1) args(0) else "local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(2)) // new context
ssc.checkpoint(checkpointDir)
val lines = ssc.socketTextStream("10.2.35.117", 9999) // create DStreams
val words = lines.flatMap(_.split(" "))
val pairs = words.map((_, 1))
val initialRDD = ssc.sparkContext.parallelize(List(("hello", 1), ("world", 1)))
val mappingFunc = (word: String, one: Option[Int], state: State[Int]) => {
val sum = one.getOrElse(0) + state.getOption.getOrElse(0)
val output = (word, sum)
state.update(sum)
output
}
val stateCounter = pairs.mapWithState(StateSpec.function(mappingFunc))
stateCounter.print
ssc
}
val ssc = StreamingContext.getOrCreate(checkpointDir, functionToCreateContext _)
val sc = ssc.sparkContext
sc.setLogLevel("WARN")
ssc.start()
ssc.awaitTermination()
}
}
./spark-submit --class SimpleApp /data/scala-test/hw/target/scala-2.11/hello_2.11-0.0.1-SNAPSHOT.jar
我的问题是初始化状态的正确方法是什么?