NotSerializableException:已启用DStream检查点,但DStream及其功能不可序列化

时间:2016-08-25 08:48:55

标签: serialization apache-spark spark-streaming

我遇到以下异常: 线程" main"中的例外情况java.io.NotSerializableException:已启用DStream检查点,但DStream及其功能不可序列化

我在外面启用了checkpoting,并使用这个类进程。 并且它说这个类不是serializablbe:

class EventhubsStateTransformComponent(inStream: DStream[EventhubsEvent]) extends PipelineComponent with Logging{
    def process() = {
        inStream.foreachRDD(rdd => {
            if (rdd.isEmpty()) {
                logInfo("Extract outstream is empty...")
            } else {
                logInfo("Extract outstream is not empty...")
            }
        })
        // TODO eventhubsId is hardcode
        val eventhubsId = "1"
        val statePairStream = inStream.map(eventhubsEvent => ((eventhubsId, eventhubsEvent.partitionId), eventhubsEvent.eventOffset))
        val eventhubsEventStateStream = statePairStream.mapWithState(StateSpec.function(EventhubsStreamState.updateStateFunc _))
        val snapshotStateStream = eventhubsEventStateStream.stateSnapshots()
        val out = snapshotStateStream.map(state =>  {
            (state._1._1, state._1._2, state._2, System.currentTimeMillis() / 1000)
        })
        outStream = out
    }
}

P.S EventhubsEvent是一个案例类。

=============================================== ======

新编辑:在我使这个类扩展Serialzable之后,异常消失了。但我想知道我们需要使自己的类扩展Serializable的情况。这是否意味着如果一个类具有foreachRDD操作,它将触发检查点来验证代码,并且它需要包含foreachRDD操作的整个对象是Serializable?因为在我的记忆中,有些情况只需要foreachRDD范围内的对象需要可序列化。

Serialization stack:
    - object not serializable (class: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent, value: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent@2a92a7fd)
    - field (class: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent$$anonfun$process$1, name: $outer, type: class com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent)
    - object (class com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent$$anonfun$process$1, <function1>)
    - field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, name: cleanedF$1, type: interface scala.Function1)
    - object (class org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, <function2>)
    - writeObject data (class: org.apache.spark.streaming.dstream.DStream)
    - object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@3e1cb83b)
    - element of array (index: 0)
    - array (class [Ljava.lang.Object;, size 16)
    - field (class: scala.collection.mutable.ArrayBuffer, name: array, type: class [Ljava.lang.Object;)
    - object (class scala.collection.mutable.ArrayBuffer, ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@3e1cb83b, org.apache.spark.streaming.dstream.ForEachDStream@46034134))
    - writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
    - object (class org.apache.spark.streaming.dstream.DStreamCheckpointData, [
0 checkpoint files])
    - writeObject data (class: org.apache.spark.streaming.dstream.DStream)
    - object (class org.apache.spark.streaming.dstream.PluggableInputDStream, org.apache.spark.streaming.dstream.PluggableInputDStream@5066ad14)
    - writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
    - object (class org.apache.spark.streaming.dstream.DStreamCheckpointData

    //....

1 个答案:

答案 0 :(得分:0)

来自序列化堆栈:

  • object not serializable(class:com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent,value:com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent@2a92a7fd)
  • field(类:com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent $$ anonfun $ process $ 1, name :$ outer,type:class com.testdm.spark。 streaming.etl.common.pipeline.EventhubsStateTransformComponent)
  • object(类com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent $$ anonfun $ process $ 1,)

name显示哪个对象未序列化,因此outer是您应该检查使用位置的字段。 某些对象不可序列化并尝试在驱动程序或执行程序中使用它,但不能将它从驱动程序传递给在执行程序中执行的其他函数。