如何在Python Spark Streaming中更新广播变量?

时间:2019-02-02 04:38:01

标签: python apache-spark pyspark spark-streaming broadcast

我需要使用Python在Spark Structured Streaming中随时间(例如,在定义的时间间隔内)更新广播变量。 我知道已经回答了类似的问题(here),但是它们都在Scala或Java中。我需要知道如何用Python编写Broadcast Wrapper类。尝试转换Java版本,但仍未更新

class BroadcastWrapper:
  broadcastVar = None
  lastUpdatedAt = datetime.now()

  def updateAndGetRules(spark):
    currentDate = datetime.now()
    diffSec = (currentDate-BroadcastWrapper.lastUpdatedAt).total_seconds() # Difference in seconds

    if BroadcastWrapper.broadcastVar is None or diffSec > 120:
      if BroadcastWrapper.broadcastVar is not None:
        BroadcastWrapper.broadcastVar.unpersist()

      rulesDF = 'Read data from source here'

      BroadcastWrapper.broadcastVar = spark.sparkContext.broadcast(rulesDF.collect()) #I'm collecting it because I need to iterate through rules (filter) and apply that on streaming data
      BroadcastWrapper.lastUpdatedAt = datetime.now()

    return BroadcastWrapper.broadcastVar

我正在如下访问广播变量-

for rule in BroadcastWrapper.updateAndGetRules(spark).value:

如果您有其他解决方法,请告诉我

0 个答案:

没有答案