似乎,通过新的Spark结构化流式传输,当我们从Kafka读取内容时,我们不再可以将组ID作为选项传递。
Kafka option 'group.id' is not supported as user-specified consumer groups are not used to track offsets.
是否有一种方法可以强制结构化流使用给定的Kafka组ID?
代码:
val df = spark
.readStream
.format("kafka")
.option("subscribe", "topic")
.option("startingOffsets", "earliest")
.option("kafka.group.id", "idThatShouldBeUsed")
.option("kafka.bootstrap.servers", "server")
.option("kafka.security.protocol", "SASL_SSL")
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.ssl.truststore.location", "/location)
.option("kafka.ssl.truststore.password", "pass")
.option("kafka.sasl.jaas.config", """jaasToUse""")
.load()
.writeStream
.outputMode("append")
.format("console")
.option("startingOffsets", "earliest")
.start().awaitTermination()
产生:
java.lang.IllegalArgumentException: Kafka option 'group.id' is not supported as user-specified consumer groups are not used to track offsets.**
at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateGeneralOptions(KafkaSourceProvider.scala:347)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:402)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:70)
at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:208)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:94)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:94)
at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:33)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:172)
... 57 elided
如果将Spark日志级别设置为INFO,我们可以看到所使用的组ID完全不同:
INFO consumer.ConsumerConfig: ConsumerConfig values:
group.id = spark-kafka-source-625f97d6-59ed-4a72-90f6-c4add9c3a2a7--849027099-driver-0
任何想法如何使它使用正确的组?