Spark中的java.io.NotSerializableException

时间:2018-10-30 19:48:01

标签: apache-spark serialization spark-streaming

以下是我的火花工作的一部分:

def parse(evt: Event): String = {
  try {
    val config = new java.util.HashMap[java.lang.String, AnyRef] // Line1
    config.put("key", "value") // Line2
    val decoder = new DeserializerHelper(config, classOf[GenericRecord]) // Line3
    val payload = decoder.deserializeData(evt.getId, evt.toBytes)
    val record = payload.get("data")
    record.toString
  } catch {
    case e :Exception => "exception:" + e.toString
  }
}

try {
  val inputStream = KafkaUtils.createDirectStream(
    ssc,
    PreferConsistent,
    Subscribe[String, String](Array(inputTopic), kafkaParams)
  )
  val processedStream = inputStream.map(record => parse(record.value()))
  processedStream.print()
} finally {
}

如果我将LINE1-LINE3移至parse()函数之外的上述代码中,则会得到

Caused by: java.io.NotSerializableException: SchemaDeserializerHelper
Serialization stack:
    - object not serializable (class: SchemaDeserializerHelper, value: SchemaDeserializerHelper@2e23c180)
    - field (class: App$$anonfun$1, name: decoder$1, type: class SchemaDeserializerHelper)
    - object (class App$$anonfun$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:342)
    ... 22 more

为什么?我不喜欢将Line1〜Line3放在parse()函数中,如何优化它?

谢谢

0 个答案:

没有答案