我遇到以下异常: 线程" main"中的例外情况java.io.NotSerializableException:已启用DStream检查点,但DStream及其功能不可序列化
我在外面启用了checkpoting,并使用这个类进程。 并且它说这个类不是serializablbe:
class EventhubsStateTransformComponent(inStream: DStream[EventhubsEvent]) extends PipelineComponent with Logging{
def process() = {
inStream.foreachRDD(rdd => {
if (rdd.isEmpty()) {
logInfo("Extract outstream is empty...")
} else {
logInfo("Extract outstream is not empty...")
}
})
// TODO eventhubsId is hardcode
val eventhubsId = "1"
val statePairStream = inStream.map(eventhubsEvent => ((eventhubsId, eventhubsEvent.partitionId), eventhubsEvent.eventOffset))
val eventhubsEventStateStream = statePairStream.mapWithState(StateSpec.function(EventhubsStreamState.updateStateFunc _))
val snapshotStateStream = eventhubsEventStateStream.stateSnapshots()
val out = snapshotStateStream.map(state => {
(state._1._1, state._1._2, state._2, System.currentTimeMillis() / 1000)
})
outStream = out
}
}
P.S EventhubsEvent是一个案例类。
=============================================== ======
新编辑:在我使这个类扩展Serialzable之后,异常消失了。但我想知道我们需要使自己的类扩展Serializable的情况。这是否意味着如果一个类具有foreachRDD操作,它将触发检查点来验证代码,并且它需要包含foreachRDD操作的整个对象是Serializable?因为在我的记忆中,有些情况只需要foreachRDD范围内的对象需要可序列化。
Serialization stack:
- object not serializable (class: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent, value: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent@2a92a7fd)
- field (class: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent$$anonfun$process$1, name: $outer, type: class com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent)
- object (class com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent$$anonfun$process$1, <function1>)
- field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, name: cleanedF$1, type: interface scala.Function1)
- object (class org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, <function2>)
- writeObject data (class: org.apache.spark.streaming.dstream.DStream)
- object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@3e1cb83b)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 16)
- field (class: scala.collection.mutable.ArrayBuffer, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.ArrayBuffer, ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@3e1cb83b, org.apache.spark.streaming.dstream.ForEachDStream@46034134))
- writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
- object (class org.apache.spark.streaming.dstream.DStreamCheckpointData, [
0 checkpoint files])
- writeObject data (class: org.apache.spark.streaming.dstream.DStream)
- object (class org.apache.spark.streaming.dstream.PluggableInputDStream, org.apache.spark.streaming.dstream.PluggableInputDStream@5066ad14)
- writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
- object (class org.apache.spark.streaming.dstream.DStreamCheckpointData
//....
答案 0 :(得分:0)
来自序列化堆栈:
name
显示哪个对象未序列化,因此outer
是您应该检查使用位置的字段。
某些对象不可序列化并尝试在驱动程序或执行程序中使用它,但不能将它从驱动程序传递给在执行程序中执行的其他函数。