我想从kafka读取结构化的流数据作为dstream,对于每个数据,我想用许多函数处理它,所以我尝试缓存dstream。
lines = spark\
.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers", bootstrapServers)\
.option(subscribeType, topics)\
.load()\
.selectExpr("CAST(value AS STRING)")
lines.cache()
...
...
lines.cache()
但是,我得到了如下错误
Queries with streaming sources must be executed with writeStream.start();;
任何帮助?
答案 0 :(得分:0)
与队列功能无关。最后添加lines.start()。
Plz在提问之前参考示例代码。 https://github.com/apache/spark/blob/master/examples/src/main/python/sql/streaming/structured_kafka_wordcount.py