Question

我正在尝试使用Spark Direct Stream获取并存储Kafka中特定消息的偏移量。查看Spark文档很容易获得每个分区的范围偏移量，但我需要的是在完全扫描队列后存储主题的每条消息的起始偏移量。

Answer 1

是的，您可以使用createDirectStream message metadata版本的tuple3来访问val ssc = new StreamingContext(sparkConf, Seconds(10)) val kafkaParams = Map[String, String]("metadata.broker.list" -> (kafkaBroker)) var fromOffsets = Map[TopicAndPartition, Long]() val topicAndPartition: TopicAndPartition = new TopicAndPartition(kafkaTopic.trim, 0) val topicAndPartition1: TopicAndPartition = new TopicAndPartition(kafkaTopic1.trim, 0) fromOffsets += (topicAndPartition -> inputOffset) fromOffsets += (topicAndPartition1 -> inputOffset1) val messagesDStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder, Tuple3[String, Long, String]](ssc, kafkaParams, fromOffsets, (mmd: MessageAndMetadata[String, String]) => { (mmd.topic ,mmd.offset, mmd.message().toString) })。

您可以在此处找到返回tuple3._1的Dstream的示例。

topic

在上面的示例中，tuple3._2将offset，tuple3._3将message，<div class="front">将position: absolute。

希望这有帮助！

是否可以在Kafka + SparkStreaming中获取特定的消息偏移量？

1 个答案: