根据帖子How to update a Static Dataframe with Streaming Dataframe in Spark structured streaming,我有一个类似的用例。您可以从上述帖子中获取相同的数据作为示例。
static_df = spark.read.schema(schemaName).json(fileName)
streaming_df = spark.readStream(....)
new_reference_data = update_reference_df(streaming_df,static_df)
def update_reference_df(df, static_df):
query: StreamingQuery = df \
.writeStream \
.outputMode("append") \
.foreachBatch(lambda batch_df, batchId: update_static_df(batch_df, static_df)) \
.start()
return query
def update_static_df(batch_df, static_df):
df1: DataFrame = static_df.union(batch_df.join(static_df,
(batch_df.SITE == static_df.SITE)
"left_anti"))
return df1
我想知道如何通过通过foreachBatch处理的数据中的新值刷新static_df。据我所知,foreachBatch不返回任何内容(无效)。我需要在进一步处理中使用static_df中的新值。感谢您的帮助。