计算Flink中连续事件的持续时间

时间:2019-06-01 11:44:18

标签: apache-flink

我有一些设备状态更改的流,例如:case class DeviceState(ts: Long, state: Int)。设备仅在更改后才发送状态。因此,例如,它可能是这样的:

ts | state
----------
 0 | ONLINE
 3 | OFFLINE
11 | ONLINE
19 | OFFLINE

(在实际代码ts中是unix时间毫秒,出于示例目的,我将其简化了) 我想通过滚动10个刻度的窗口来划分此流,并计算每个状态的总持续时间,因此,例如,如果标点是在刻度45处发出的,则结果应如下所示:

 window | state   | duration
-----------------------------
 0 - 10 | ONLINE  | 3
 0 - 10 | OFFLINE | 7
10 - 20 | OFFLINE | 2
10 - 20 | ONLINE  | 8
20 - 30 | OFFLINE | 10
30 - 40 | OFFLINE | 10

是否可以在Flink中进行这样的持续时间计算?我认为可以通过自定义的reduce函数来实现,但是我无法弄清楚如何发出最后一个状态,因此它将出现在每个窗口中(在上面的示例中,最后一个状态位于第19跳时,但仍应在Windows 20-30、30-40等)。

1 个答案:

答案 0 :(得分:0)

With Flink's window API, a window doesn't exist until an event is assigned to it, which makes what you are trying to do more difficult.

One solution might be to use a ProcessFunction with a timer to mix into your stream a third type of event that's only used to trigger the windows that would otherwise be empty.

Another solution would be to do all the work of computing the analytics with a ProcessFunction (with some state and timers), rather than windows.