我想在火花流中使用窗口操作。是否可以保证每个窗口都滑入更多(slideDuration / batchDuration)的Rdds?
For example, If I set batchDuration=10s in streamingContext, and slideDuration=30s in window operation.
stream.window(60s, slideDuration).forEachRDD(**unionRdd** -> .....)
上面的unionRdd在每次运行中是否会再包含3个批次?
答案 0 :(得分:0)
窗口中RDD的确切数量可以用以下函数表示:
t: Time elapsed
W: Window duration
S: Sliding duration
B: Batch interval
if t%S ==0 AND S%B == 0 AND W%B==0
if t>=W
N := W/B
else
N := t/B
else N := null // no rdd under processing.