Question

版本：

spark 2.2
kafka 0.11

根据documentation在kafka中提交偏移我应该使用：

stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)

结果偏移仅在下一批开始时提交。这导致“持续”滞后。

是否有任何解决方法可以在当前批次结束时提交偏移量（但仍然使用相同的kafka组进行偏移）？

延迟监控的示例：

Answer 1

是否有任何解决方法可以在当前批次结束时提交抵消

不是通过commitAsync API。方法调用的作用是排队要提交的偏移量，然后在DirectKafkaInputDStream.compute期间进行异步提交：

override def compute(validTime: Time): Option[KafkaRDD[K, V]] = {
  val untilOffsets = clamp(latestOffsets())

  // Create KafkaRDD and other irrelevant code

  currentOffsets = untilOffsets
  commitAll()
  Some(rdd)
}

commitAll只是轮询由commitAsync填充的队列：

protected def commitAll(): Unit = {
  val m = new ju.HashMap[TopicPartition, OffsetAndMetadata]()
  var osr = commitQueue.poll()
  while (null != osr) {
    val tp = osr.topicPartition
    val x = m.get(tp)
    val offset = if (null == x) { osr.untilOffset } else { Math.max(x.offset, osr.untilOffset) }
    m.put(tp, new OffsetAndMetadata(offset))
    osr = commitQueue.poll()
  }
  if (!m.isEmpty) {
    consumer.commitAsync(m, commitCallback.get)
  }
}

因此，遗憾的是，如果您想将偏移量作为事务提交，那么您将不得不将它们分别存储在您自己的商店中，而不是使用Kafka的内置偏移提交跟踪。

Spark：在批处理结束时提交kafka偏移量

1 个答案: