我需要使用Python在Spark Structured Streaming中随时间(例如,在定义的时间间隔内)更新广播变量。 我知道已经回答了类似的问题(here),但是它们都在Scala或Java中。我需要知道如何用Python编写Broadcast Wrapper类。尝试转换Java版本,但仍未更新
class BroadcastWrapper:
broadcastVar = None
lastUpdatedAt = datetime.now()
def updateAndGetRules(spark):
currentDate = datetime.now()
diffSec = (currentDate-BroadcastWrapper.lastUpdatedAt).total_seconds() # Difference in seconds
if BroadcastWrapper.broadcastVar is None or diffSec > 120:
if BroadcastWrapper.broadcastVar is not None:
BroadcastWrapper.broadcastVar.unpersist()
rulesDF = 'Read data from source here'
BroadcastWrapper.broadcastVar = spark.sparkContext.broadcast(rulesDF.collect()) #I'm collecting it because I need to iterate through rules (filter) and apply that on streaming data
BroadcastWrapper.lastUpdatedAt = datetime.now()
return BroadcastWrapper.broadcastVar
我正在如下访问广播变量-
for rule in BroadcastWrapper.updateAndGetRules(spark).value:
如果您有其他解决方法,请告诉我