apache-spark - Spark Streaming：两个流全状态运行

我正在使用Spark Streaming。我有两个流：value-stream和user-thresholds流。我必须基于value-stream流中指定的用户阈值过滤user-threshold-updates的每200-1000ms，这很令人惊讶：）

我不想引入额外的复杂性并在外部存储（即Redis）上保持状态不变。我更喜欢使用Spark Streaming有状态操作mapWithState。

问题是mapWithState在一个流上运行，而我有两个。我无法合并value-stream和user-threshold-updates，因为它们具有不同的架构。我不想对每个微批处理都执行stateSnapshots，因为它不起作用（value-stream仅包含已更改的用户值，而user-threshold-updates的快照将包含所有阈值用户）

基于value-stream发出的状态来过滤user-thresholds的最佳方法是什么？

Spark Streaming：两个流全状态运行

0 个答案: