我正在尝试使用TestStream进行实验,看看后期元素的处理方式,但却有一些非常有趣且令人困惑的行为。
具体来说,我在一个窗口(windowTwo)中添加一个带有时间戳的元素“2”,然后将水印移动到窗口结束之后但在endOfWindow + Lateness之前,最后,我添加另一个元素“3”窗口中有时间戳。
有趣和令人困惑的事情是:我希望 5 中的所有元素的总和看到 5 ,但它失败并说
预期:以任何顺序迭代[< 5>], 但是:不匹配:< 2>
但是,如果我将预期的总和从 2 更改为 5 ,它仍会失败,并说
预期:以任何顺序迭代[< 2>], 但是:不匹配:< 5>
发生了什么事?
import org.apache.beam.sdk.coders.BigEndianIntegerCoder;
import org.apache.beam.sdk.testing.NeedsRunner;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.testing.TestStream;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.Sum;
import org.apache.beam.sdk.transforms.windowing.*;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.TimestampedValue;
import org.joda.time.Duration;
import org.joda.time.Instant;
import org.junit.Rule;
import org.junit.Test;
import org.junit.experimental.categories.Category;
public class BeamAppTest {
@Rule
public final transient TestPipeline pipeline = TestPipeline.create();
@Test
@Category(NeedsRunner.class)
public void testApp() {
final Duration windowLengthMin = Duration.standardMinutes(10);
final Duration latenessMin = Duration.standardMinutes(5);
final Duration oneMin = Duration.standardMinutes(1);
final Instant windowOneStart = new Instant(0L).plus(Duration.standardMinutes(20));
final Instant windowOneEnd = windowOneStart.plus(windowLengthMin);
final IntervalWindow windowOne = new IntervalWindow(windowOneStart, windowOneEnd);
final Instant windowTwoStart = windowOneEnd;
final Instant windowTwoEnd = windowTwoStart.plus(windowLengthMin);
final IntervalWindow windowTwo = new IntervalWindow(windowTwoStart, windowTwoEnd);
TestStream<Integer> testStream = TestStream.create(BigEndianIntegerCoder.of())
.addElements(TimestampedValue.of(1, windowOneStart.plus(oneMin))) // early window one
.advanceWatermarkTo(windowOneEnd) // watermark passes window one
.addElements(TimestampedValue.of(2, windowTwoStart.plus(oneMin))) // early window two
.advanceWatermarkTo(windowTwoEnd.plus(latenessMin).minus(oneMin)) // water mark passes window two
.addElements(TimestampedValue.of(3, windowTwoStart.plus(oneMin))) // late window two
.advanceProcessingTime(oneMin.plus(oneMin))
.advanceWatermarkToInfinity();
PCollection<Integer> means = pipeline.apply(testStream).apply(new CalSum(windowLengthMin, latenessMin));
PAssert.that(means)
.inWindow(windowOne)
.containsInAnyOrder(1);
PAssert.that(means)
.inWindow(windowTwo)
.containsInAnyOrder(2); // change the 2 to 5 here to see magic!!!
pipeline.run().waitUntilFinish();
}
static class CalSum extends PTransform<PCollection<Integer>, PCollection<Integer>> {
private final Duration WINDOW_LENGTH_MIN;
private final Duration LATENESS_MIN;
CalSum(Duration windowLengthMin, Duration latenessMin) {
WINDOW_LENGTH_MIN = windowLengthMin;
LATENESS_MIN = latenessMin;
}
@Override
public PCollection<Integer> expand(PCollection<Integer> input) {
return input
.apply(Window
.<Integer>into(FixedWindows.of(WINDOW_LENGTH_MIN))
.withAllowedLateness(LATENESS_MIN)
.accumulatingFiredPanes() // accumulating trigger
.triggering(AfterWatermark.pastEndOfWindow() // trigger at end of window
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(2))) // trigger every 2 min within the window
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(1))))) // trigger every 1 min after the window
.apply(Sum.integersGlobally().withoutDefaults());
}
}
}
答案 0 :(得分:0)
如上所述,使用元素到达的时间和水印,windowTwo
包含两个元素:2
和5
。这是您已设置的触发结果:输入2
到达的时间戳为windowTwoStart
加上一分钟,此时水印位于windowTwo
结束之前,并按时完成。然后,水印超过windowTwo
的末尾,导致AfterWatermark
触发器触发。
在此之后,输入3
到达 - 这是在它所在的窗口的水印之后(因此元素已晚),但是在水印已经过了窗口的末尾加上允许的延迟之前(因此元素不可丢弃)。结果,当水印再次前进时,元素与较早的2
一起产生(由于已经选择的累积模式),其中它被组合到您观察到的5
中。 p>
准时窗格(您可以使用PAssert.that(means).inOnTimePane(windowTwo)
匹配)仅包含值2
;在窗口的生命周期内,2
和5
都会生成,因此inWindow
断言会对[2, 5]
进行检查。