Question

我想从kafka读取结构化的流数据作为dstream，对于每个数据，我想用许多函数处理它，所以我尝试缓存dstream。

lines = spark\
    .readStream\
    .format("kafka")\
    .option("kafka.bootstrap.servers", bootstrapServers)\
    .option(subscribeType, topics)\
    .load()\
    .selectExpr("CAST(value AS STRING)")
lines.cache()
...
...
lines.cache()

但是，我得到了如下错误

Queries with streaming sources must be executed with writeStream.start();;

任何帮助？

Answer 1

与队列功能无关。最后添加lines.start（）。

Plz在提问之前参考示例代码。 https://github.com/apache/spark/blob/master/examples/src/main/python/sql/streaming/structured_kafka_wordcount.py

如何在spark

1 个答案: