出于测试目的,我使用以下自定义来源:
class ThrottledSource[T](
data: Array[T],
throttling: Int,
beginWaitingTime: Int = 0,
endWaitingTime: Int = 0
) extends SourceFunction[T] {
private var isRunning = true
private var offset = 0
override def run(ctx: SourceFunction.SourceContext[T]): Unit = {
Thread.sleep(beginWaitingTime)
val lock = ctx.getCheckpointLock
while (isRunning && offset < data.length) {
lock.synchronized {
ctx.collect(data(offset))
offset += 1
}
Thread.sleep(throttling)
}
Thread.sleep(endWaitingTime)
}
override def cancel(): Unit = isRunning = false
并在我的测试中像这样使用它
val controlStream = new ThrottledSource[Control](
data = Array(c1,c2), endWaitingTime = 10000, throttling = 0,
)
val dataStream = new ThrottledSource[Event](
data = Array(e1,e2,e3,e4,e5),
throttling = 1000,
beginWaitingTime = 2000,
endWaitingTime = 2000,
)
val dataStream = env.addSource(events)
env.addSource(controlStream)
.connect(dataStream)
.process(MyProcessFunction)
我的意图是首先获取所有控制元素(这就是为什么我不指定任何beginWaitingTime
或任何throttling
的原因)。在MyProcessFunction的processElement1
和processElement2
中,我在收到元素时将其打印出来。在大多数情况下,我会首先获得预期的两个控制元素,但对于我而言,有时我会意外地首先获取数据元素,尽管数据源开始发射其元素时使用了两秒钟的延迟。谁能向我解释一下?