如何在spark

时间:2018-01-11 10:48:39

标签: apache-spark spark-dataframe spark-streaming

我想从kafka读取结构化的流数据作为dstream,对于每个数据,我想用许多函数处理它,所以我尝试缓存dstream。

lines = spark\
    .readStream\
    .format("kafka")\
    .option("kafka.bootstrap.servers", bootstrapServers)\
    .option(subscribeType, topics)\
    .load()\
    .selectExpr("CAST(value AS STRING)")
lines.cache()
...
...
lines.cache()

但是,我得到了如下错误

Queries with streaming sources must be executed with writeStream.start();;

任何帮助?

1 个答案:

答案 0 :(得分:0)

与队列功能无关。最后添加lines.start()。

Plz在提问之前参考示例代码。 https://github.com/apache/spark/blob/master/examples/src/main/python/sql/streaming/structured_kafka_wordcount.py