我开发了一个Spark Streaming应用程序,用于检查文件流。我需要在任何驱动器异常上停止我的流应用程序..我的代码如下:
val fileStream=..
fileStream.checkpoint(Duration(batchIntervalSeconds * 1000 * 5))
//initiate the chekpointing
fileStream.foreachRDD(r=> {
try {
r.count()
} catch {
case ex: Exception => {
ssc.stop(true, true)
}
}
}
)
但是,我从上面的代码中得到了例外
yarn.ApplicationMaster: User class threw exception:
java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serializable
org.apache.spark.streaming.StreamingContext
Serialization stack:
- object not serializable (class: org.apache.spark.streaming.StreamingContext, value: org.apache.spark.streaming.StreamingContext@45ae9d8b)
- field (class: UnionStream$$anonfun$creatingFunc$3, name: ssc$1, type: class org.apache.spark.streaming.StreamingContext)
- object (class UnionStream$$anonfun$creatingFunc$3, <function1>)
- field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, name: cleanedF$1, type: interface scala.Function1)
- object (class org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, <function2>)
- writeObject data (class: org.apache.spark.streaming.dstream.DStream)
- object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@12481647)
- writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
- object (class org.apache.spark.streaming.dstream.DStreamCheckpointData,
我在纱线群集模式下运行我的代码..
答案 0 :(得分:1)
要在foreachRDD调用中发生异常时停止spark流应用程序,请不要尝试捕获foreachRDD中的异常。而是将ssc.awaitTermination调用包装在try / catch块中并从那里调用ssc.stop:
val ssc = createStreamingContext()
ssc.start()
try {
ssc.awaitTermination()
} catch {
case e: Exception =>
ssc.stop(stopSparkContext = true, stopGracefully = true)
throw e // to exit with error condition
}
答案 1 :(得分:0)
您是否尝试过forEachRDD中的try {} catch并将调用包装在try + catch {}内的foreachrdd中,类似这样
try {
//initiate the chekpointing
fileStream.foreachRDD(r=> {
r.count()
}
}
} catch {
case ex: Exception => {
ssc.stop(true, true)
}
)
从异常看起来它看起来像是在foreachRDD块中包含所有代码,包括异常句柄,需要SparkStreamingContext并尝试序列化它以便它可以将它发送到将处理当前RDD进程的节点。由于SparkStreamingContext不可序列化,因此它正在爆炸。