zkClient不能Serializabled,sparkstreaming写kafka偏移到zookeeper

时间:2017-05-27 06:23:27

标签: scala apache-kafka apache-zookeeper

我的项目包括ZooKeeper,Kafka和Spark Streaming。当我尝试使用Spark Streaming将Kafka偏移量写入ZooKeeper时,问题是zkClient无法序列化。我见过几个GitHub项目,例如:https://github.com/ippontech/spark-kafka-source

//save the offsets

kafkaStream.foreachRDD(rdd => offsetsStore.saveOffsets(topic, rdd))

def saveOffsets(topic: String, rdd: RDD[_]): Unit = {

    logger.info("Saving offsets to ZooKeeper")
    val stopwatch = new Stopwatch()

    val offsetsRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
    offsetsRanges.foreach(offsetRange => logger.debug(s"Using ${offsetRange}"))

    val offsetsRangesStr = offsetsRanges.map(offsetRange => s"${offsetRange.partition}:${offsetRange.fromOffset}").mkString(",")
    logger.debug(s"Writing offsets to ZooKeeper: ${offsetsRangesStr}")
    **ZkUtils.updatePersistentPath(zkClient, zkPath, offsetsRangesStr)**

    logger.info("Done updating offsets in ZooKeeper. Took " + stopwatch)

}

代码:kafkaStream.foreachRDD(rdd => offsetsStore.saveOffsets(rdd))将在对象private val zkClient = new ZkClient(zkHosts, 30000, 30000,ZKStringSerializer)的驱动程序offsetStore中执行,但zkClient无法序列化,它是如何工作的?

1 个答案:

答案 0 :(得分:0)

您可以将zkClient定义为@transient lazy val,这意味着它不会在驱动程序和执行程序之间进行序列化(这是@transient部分),而是将在每个部分重新初始化以及包含上述代码的类的每个实例(这是lazy部分)。

您可以在此处详细了解此模式: http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/