Flink抛出java.io.NotSerializableException

时间:2017-12-21 10:28:18

标签: scala apache-kafka deserialization apache-flink

我制作了自定义KeyedDeserializationSchema来反序列化kafka消息并像这样使用它:

object Job {
  case class KafkaMsg[K, V](
    key: K, value: V, topic: String, partiton: Int, offset: Long)

  trait Deser[A] {
    def deser(a: Array[Byte]): A
  }

  object Deser {

    def apply[A](implicit sh: Deser[A]): Deser[A] = sh
    def deser[A: Deser](a: Array[Byte]) = Deser[A].deser(a)

    implicit val stringDeser: Deser[String] =
      new Deser[String] {
        def deser(a: Array[Byte]): String = ""
      }

    implicit val longDeser: Deser[Long] =
      new Deser[Long] {
        def deser(a: Array[Byte]): Long = 0
      }
  }

  class TypedKeyedDeserializationSchema[
    K: Deser: TypeInformation,
    V: Deser: TypeInformation
  ] extends KeyedDeserializationSchema[KafkaMsg[K, V]] {

    def deserialize(key:   Array[Byte],
                    value: Array[Byte],
                    topic: String,
                    partition: Int,
                    offset:    Long
    ): KafkaMsg[K, V] =
      KafkaMsg(Deser[K].deser(key),
               Deser[V].deser(value),
               topic,
               partition,
               offset
      )

    def isEndOfStream(e: KafkaMsg[K, V]): Boolean = false

    def getProducedType(): TypeInformation[KafkaMsg[K, V]] =
      createTypeInformation
  }

  def main(args: Array[String]) {
    val properties = new Properties
    properties.setProperty("bootstrap.servers", "localhost:9092")
    properties.setProperty("group.id", "flink-test")

    val env = StreamExecutionEnvironment.getExecutionEnvironment

    val stream = env
        .addSource(new FlinkKafkaConsumer011(
                     "topic",
                     new TypedKeyedDeserializationSchema[String, Long],
                     properties
                   ))
        .print

    env.execute("Flink Scala API Skeleton")
  }
}

这给了我:

[error] Caused by: java.io.NotSerializableException: l7.Job$Deser$$anon$7
[error]         at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
[error]         at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
[error]         at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
[error]         at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
[error]         at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
[error]         at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
[error]         at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
[error]         at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
[error]         at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
[error]         at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
[error]         at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:315)
[error]         at org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:170)
[error]         at org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:164)
[error]         at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scalaClean(StreamExecutionEnvironment.scala:670)
[error]         at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.scala:600)
[error]         at l7.Job$.main(Job.scala:89)
[error]         at l7.Job.main(Job.scala)

问题显然在我的Deser类型类实现中,但我不明白究竟是什么导致了这个错误或如何解决它。

1 个答案:

答案 0 :(得分:1)

是的,导致此错误的原因是DeserTypeInformation不同,不会延伸/实施Serializable。要了解发生这种情况的原因,您可以先问自己一个问题:为什么我需要声明implicit val stringDeserimplicit val longDeser

答案是Scala编译器在以K: Deser: TypeInformation的形式看到通用约束时所做的事情。它的作用是使用implicit证据对象重写它。所以你的代码转换成这样的东西:

class TypedKeyedDeserializationSchema[K, V](implicit val kDeserEv: Deser[K],
                                            val kTypeInfoEn: TypeInformation[K],
                                            val vDeserEv: Deser[V],
                                            val vTypeInfoEn: TypeInformation[V]) extends KeyedDeserializationSchema[KafkaMsg[K, V]] {

  def deserialize(key: Array[Byte],
                  value: Array[Byte],
                  topic: String,
                  partition: Int,
                  offset: Long
                 ): KafkaMsg[K, V] =
    KafkaMsg(kDeserEv.deser(key),
      vDeserEv.deser(value),
      topic,
      partition,
      offset
    )

  def isEndOfStream(e: KafkaMsg[K, V]): Boolean = false

  def getProducedType(): TypeInformation[KafkaMsg[K, V]] = createTypeInformation
}

现在很明显,TypedKeyedDeserializationSchema[String,Long]类型的对象实际上包含两个类型为Deser[String]Deser[Long]的字段,其值来自您在上面声明的implicit val。因此,当Flink尝试确保您传递给它的函数为Serializable时,检查失败。

现在解决方案很明显:让你的特质Deser[A]延伸Serializable

trait Deser[A] extends Serializable {
  def deser(a: Array[Byte]): A
}