如何在Spark结构化流中订阅特定分区并从自定义偏移中读取内容?

时间:2018-12-12 00:38:44

标签: spark-structured-streaming

我有一个用例,其中多个表发布到同一主题,但分区不同。我想单独从特定分区和自定义偏移量读取。

 val data = sql.readStream.format("kafka")
      .option("kafka.bootstrap.servers", "servers")
      .option("assign", {"TEST1":[0]})
      .option("startingOffsets",""" {"TEST1":{"0":172260244}} """)
      .option("endingOffsets",""" {"TEST1":{"0":-1}} """)
      .load()

因此,当我订阅它时,出现以下错误。主题名称将自动转换为小写。

WARN org.apache.spark.sql.kafka010.KafkaSource  - Error in attempt 1 getting Kafka offsets: 
java.lang.AssertionError: assertion failed: If startingOffsets contains specific offsets, you must specify all TopicPartitions.
Use -1 for latest, -2 for earliest, if you don't care.
Specified: Set(test1-0) Assigned: Set(TEST1-0)

1 个答案:

答案 0 :(得分:1)

找出问题所在。 Spark库升级到更高版本后出现的错误解决了该问题。

https://issues.apache.org/jira/browse/SPARK-19853