是否可以在一个应用程序中拥有两个独立的ReadStream?我正在尝试听两个单独的Kafka主题,并根据两个DataFrame进行计算。
答案 0 :(得分:1)
您可以简单地订阅多个主题:
// Subscribe to multiple topics
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1,topic2")
.load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.as[(String, String)]
或者,如果您特别想在一个应用中使用两个独立的readStream
定义:
// read stream A
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.as[(String, String)]
// read stream B
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic2")
.load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.as[(String, String)]
答案 1 :(得分:0)
您应该可以通过在Spark 2.3.0中使用join()
来实现此目的:
val stream1 = spark.readStream. ...
val stream2 = spark.readStream. ...
val joinedDf = stream1.join(stream2, "join_column_id")