Question

我目前正在Dataproc上运行spark作业，并且在尝试重新加入组并从kafka主题读取数据时遇到错误。我做了一些挖掘，不确定是什么问题。我已将auto.offset.reset设置为earliest，因此应该从最早可用的未提交偏移量中进行读取，并且最初我的火花日志如下所示：

19/04/29 16:30:30 INFO     
org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer 
clientId=consumer-1, groupId=demo-group] Resetting offset for 
partition demo.topic-11 to offset 5553330.
19/04/29 16:30:30 INFO     
org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer 
clientId=consumer-1, groupId=demo-group] Resetting offset for 
partition demo.topic-2 to offset 5555553.
19/04/29 16:30:30 INFO 
org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer 
clientId=consumer-1, groupId=demo-group] Resetting offset for 
partition demo.topic-3 to offset 5555484.
19/04/29 16:30:30 INFO 
org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer 
clientId=consumer-1, groupId=demo-group] Resetting offset for 
partition demo.topic-4 to offset 5555586.
19/04/29 16:30:30 INFO 
org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer 
clientId=consumer-1, groupId=demo-group] Resetting offset for 
partition demo.topic-5 to offset 5555502.
19/04/29 16:30:30 INFO 
org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer 
clientId=consumer-1, groupId=demo-group] Resetting offset for 
partition demo.topic-6 to offset 5555561.
19/04/29 16:30:30 INFO 
org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer 
clientId=consumer-1, groupId=demo-group] Resetting offset for 
partition demo.topic-7 to offset 5555542.```

但是在接下来的一行中，我尝试从服务器上不存在的偏移量读取时遇到错误（您可以看到分区的偏移量与上面列出的偏移量不同，所以我不知道为什么要尝试读取偏移量的形式，这是下一行的错误：

org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets 
out of range with no configured reset policy for partitions: 
{demo.topic-11=4544296}

有什么想法可以解释为什么我的火花作业会不断返回到此偏移量（4544296），而不是最初输出的偏移量（5553330）？

它似乎与a矛盾，即a）表示其打开的实际偏移量和它试图读取的偏移量； b）表示没有配置的重置策略。

Answer 1

一年到很晚的答案，但希望能帮助其他面临类似问题的人。

通常，此行为在消费者尝试读取Kafka主题中不再存在的偏移量时显示。偏移量不再存在，通常是因为它已被Kafka Cleaner清除（例如，由于保留或压缩策略）。但是，卡夫卡仍然知道消费者组，卡夫卡将有关“ demo.topic”主题及其所有分区的“ demo-group”组的最新消费消息保留在信息中。

因此，auto.offset.reset配置没有任何影响，因为不需要重置。相反，卡夫卡认识了消费者群。

此外，Fetcher仅告诉您主题的每个分区内的最新可用偏移量。它不是 not 自动表示实际上轮询所有消息直到此偏移量。 Spark决定每个分区实际消耗和处理多少消息（基于配置maxRatePerPartition）。

要解决此问题，您可以更改使用者组（在这种情况下可能不是您想要的），也可以使用手动重置使用者组“演示组”的偏移量

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group demo-group --topic demo.topic --partition 11 --to-latest

根据您的要求，您可以使用该工具重置主题每个分区的偏移量。帮助功能或文档说明了所有可用的选项。

Spark设置为从最早的偏移量读取-尝试使用Kafka不再可用的偏移量时引发错误

1 个答案: