Spark流中窗口操作的确切Rdds数

时间:2016-07-13 19:21:08

标签: spark-streaming

我想在火花流中使用窗口操作。是否可以保证每个窗口都滑入更多(slideDuration / batchDuration)的Rdds?

For example, If I set batchDuration=10s in streamingContext, and slideDuration=30s in window operation. 

stream.window(60s, slideDuration).forEachRDD(**unionRdd** -> .....)

上面的unionRdd在每次运行中是否会再包含3个批次?

1 个答案:

答案 0 :(得分:0)

窗口中RDD的确切数量可以用以下函数表示:

t: Time elapsed
W: Window duration
S: Sliding duration
B: Batch interval 

if t%S ==0 AND S%B == 0 AND W%B==0
    if t>=W
        N := W/B
    else
        N := t/B
else N := null // no rdd under processing.