Spark Structured Streaming将结果写到kafka,但是我无法使用kafka的控制台使用数据在Kafka的主题中找到数据。
version:kafka_2.11-0.10.2.0 spark-2.4.1-bin-with-hadoop scala-2.11.12
object kafkatest {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().appName("test").getOrCreate()
val lines = spark.readStream.format("socket")
.option("host","localhost").option("port",9999).load()
import spark.implicits._
val words = lines.as[String].flatMap(_.split(" "))
val wordcounts = words.groupBy("value").count()
val query = wordcounts
.select(wordcounts("value").as("value")
.writeStream
.outputMode("complete").format("kafka")
.option("kafka.bootstrap.servers","192.168.234.10:9092")
.option("topic","hello")
.start()
query.awaitTermination()
然后我使用Kafka控制台来消费消息:
./kafka-console-consumer.sh --zookeeper localhost:2181 --topic hello
但是我在Kafka控制台上没有收到任何消息。结构化流工作并计算了结果。
我发现日志是这样的。
19/04/25 09:14:50 INFO MicroBatchExecution: Streaming query made progress: {
"id" : "49806357-3bb4-4cc8-b726-933118e4f9f1",
"runId" : "7ff3b925-a1f0-47f0-b5eb-aaed96910cd4",
"name" : null,
"timestamp" : "2019-04-25T01:14:50.009Z",
"batchId" : 1,
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0,
"durationMs" : {
"getEndOffset" : 0,
"setOffsetRange" : 3,
"triggerExecution" : 5
},
"stateOperators" : [ ],
"sources" : [ {
"description" : "KafkaV2[Subscribe[hello]]",
"startOffset" : {
"hello" : {
"0" : 8
}
},
"endOffset" : {
"hello" : {
"0" : 8
}
},
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0
} ],
"sink" : {
"description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@318cfc08"
}
}
为什么下沉不是卡夫卡?但是我已经设置好了。
"sink" : {
"description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@318cfc08"
}
我通过spark-submit控制台提交了spark应用