Question

我正在使用Spark Streaming和Kafka（使用Scala API），并希望使用Spark Streaming从一组Kafka主题中读取消息。

以下方法：

val kafkaParams = Map("metadata.broker.list" -> configuration.getKafkaBrokersList(), "auto.offset.reset" -> "smallest")
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)

从Kafka读取最新的可用偏移量，但是没有给出我需要的元数据（因为我从一组主题中读取，我需要读取该主题的每条消息）但是另一种方法{{ 1}}明确地想要一个我没有的偏移量。

我知道有这个shell命令可以提供最后一个偏移量。

KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder, Tuple2[String, String]](ssc, kafkaParams, currentOffsets, messageHandler)

和kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list <broker>: <port> --topic <topic-name> --time -1 --offsets 1是一个API，适用于曾经公开的开发人员，并为您提供我想要的内容。

提示？

Answer 1

您可以使用GetOffsetShell.scala kafka API documentation

中的代码

val consumer = new SimpleConsumer(leader.host, leader.port, 10000, 100000, clientId)
val topicAndPartition = TopicAndPartition(topic, partitionId)
val request = OffsetRequest(Map(topicAndPartition -> PartitionOffsetRequestInfo(time, nOffsets)))
val offsets = consumer.getOffsetsBefore(request).partitionErrorAndOffsets(topicAndPartition).offsets

或者您可以创建具有唯一groupId的新使用者，并将其用于获取第一个偏移量

val consumer=new KafkaConsumer[String, String](createConsumerConfig(config.brokerList))
consumer.partitionsFor(config.topic).foreach(pi => {
      val topicPartition = new TopicPartition(pi.topic(), pi.partition())

      consumer.assign(List(topicPartition))
      consumer.seekToBeginning()
      val firstOffset = consumer.position(topicPartition)
 ...

kafka和Spark：通过API获取主题的第一个偏移量

1 个答案: