我使用带有Kafka 0.10 v。的spark-sql-2.4.1版本
虽然我尝试按使用者使用数据。 即使将“ auto.offset.reset”设置为“最新”
,它也会在下面显示错误org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {COMPANY_INBOUND-16=168}
at org.apache.kafka.clients.consumer.internals.Fetcher.throwIfOffsetOutOfRange(Fetcher.java:348)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:396)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchRecord(KafkaDataConsumer.scala:361)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234)
at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234)
问题出在哪里?为什么设置不起作用?应该怎么样 固定?
第2部分:
.readStream()
.format("kafka")
.option("startingOffsets", "latest")
.option("enable.auto.commit", false)
.option("maxOffsetsPerTrigger", 1000)
.option("auto.offset.reset", "latest")
.option("failOnDataLoss", false)
.load();
答案 0 :(得分:2)
auto.offset.reset,请改为使用startingOffsets选项
auto.offset.reset:设置源选项startingOffsets以指定从何处开始。结构化流管理在内部管理哪些偏移量,而不是依靠kafka使用者来执行此操作。这将确保在动态订阅新主题/分区时不会丢失任何数据。请注意,startingOffsets仅在启动新的流查询时适用,并且恢复将始终从查询中断的地方开始。