当我使用KeyedStream运行一个简单的flink应用程序时,我观察到事件的时间延迟在0到100毫秒之间变化。下面是程序
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Long> source = env.addSource(new SourceFunction<Long>() {
public void run(SourceContext<Long> sourceContext) throws Exception {
while(true) {
synchronized (sourceContext.getCheckpointLock()) {
sourceContext.collect(System.currentTimeMillis());
Thread.sleep(1000);
}
}
}
public void cancel() {}
}).keyBy(new KeySelector<Long, Long>() {
@Override
public Long getKey(Long l) throws Exception {
return l;
}
}).addSink(new SinkFunction<Long>() {
@Override
public void invoke(Long l) throws Exception {
long diff = System.currentTimeMillis() - l;
System.out.println("in Sink: diff=" + diff);
}
});
env.execute();
输出为:
in Sink: diff=0
in Sink: diff=2
in Sink: diff=4
in Sink: diff=4
in Sink: diff=5
in Sink: diff=7
in Sink: diff=9
in Sink: diff=9
in Sink: diff=11
in Sink: diff=12
in Sink: diff=14
in Sink: diff=14
in Sink: diff=16
in Sink: diff=17
in Sink: diff=18
in Sink: diff=19
in Sink: diff=21
in Sink: diff=22
in Sink: diff=24
in Sink: diff=24
in Sink: diff=26
in Sink: diff=27
in Sink: diff=29
in Sink: diff=29
in Sink: diff=31
in Sink: diff=32
in Sink: diff=34
in Sink: diff=34
in Sink: diff=36
in Sink: diff=37
in Sink: diff=39
in Sink: diff=40
in Sink: diff=41
in Sink: diff=43
in Sink: diff=45
in Sink: diff=45
in Sink: diff=47
in Sink: diff=48
in Sink: diff=50
in Sink: diff=50
in Sink: diff=52
in Sink: diff=53
in Sink: diff=55
in Sink: diff=57
in Sink: diff=57
in Sink: diff=59
in Sink: diff=60
in Sink: diff=61
in Sink: diff=62
in Sink: diff=63
in Sink: diff=65
in Sink: diff=66
in Sink: diff=67
in Sink: diff=69
in Sink: diff=70
in Sink: diff=72
in Sink: diff=72
in Sink: diff=74
in Sink: diff=76
in Sink: diff=77
in Sink: diff=78
in Sink: diff=79
in Sink: diff=81
in Sink: diff=82
in Sink: diff=83
in Sink: diff=84
in Sink: diff=86
in Sink: diff=87
in Sink: diff=88
in Sink: diff=89
in Sink: diff=91
in Sink: diff=92
in Sink: diff=94
in Sink: diff=94
in Sink: diff=96
in Sink: diff=97
in Sink: diff=99
in Sink: diff=99
in Sink: diff=0
in Sink: diff=2
in Sink: diff=3
in Sink: diff=4
in Sink: diff=4
in Sink: diff=5
in Sink: diff=7
in Sink: diff=9
in Sink: diff=9
in Sink: diff=11
in Sink: diff=12
in Sink: diff=14
in Sink: diff=14
in Sink: diff=16
in Sink: diff=17
in Sink: diff=18
in Sink: diff=19
in Sink: diff=21
in Sink: diff=22
in Sink: diff=24
in Sink: diff=24
in Sink: diff=26
in Sink: diff=46
in Sink: diff=48
in Sink: diff=50
in Sink: diff=52
in Sink: diff=53
in Sink: diff=54
in Sink: diff=56
in Sink: diff=58
in Sink: diff=59
in Sink: diff=60
in Sink: diff=62
in Sink: diff=64
in Sink: diff=65
in Sink: diff=66
in Sink: diff=68
in Sink: diff=70
in Sink: diff=71
in Sink: diff=73
in Sink: diff=74
in Sink: diff=76
in Sink: diff=77
in Sink: diff=79
in Sink: diff=81
in Sink: diff=82
in Sink: diff=83
in Sink: diff=85
in Sink: diff=86
in Sink: diff=88
in Sink: diff=88
in Sink: diff=90
in Sink: diff=92
in Sink: diff=92
in Sink: diff=94
in Sink: diff=95
in Sink: diff=97
in Sink: diff=98
in Sink: diff=99
in Sink: diff=0
in Sink: diff=2
in Sink: diff=4
in Sink: diff=4
in Sink: diff=5
in Sink: diff=7
in Sink: diff=9
如您所见,延迟是一个模式逐渐增加到100,然后下降并从0开始,并且循环重复。我需要等待时间尽可能短。此示例是我的实际应用程序的简化版本。有人可以解释一下延迟的原因,以及如何将延迟降低到尽可能低的水平。
答案 0 :(得分:1)
此延迟的原因是通过添加keyBy,您将迫使网络改组以及序列化/反序列化。延迟如此可变的原因是由于网络缓冲。
您将要阅读文档的Controlling Latency部分。 tl; dr是您要将网络缓冲区超时设置为较小但非零的值(例如5或10 ms):
env.setBufferTimeout(timeoutMillis);
有关如何组织Flink中的网络堆栈的详细信息,请参见Flink项目博客上的A Deep-Dive into Flink's Network Stack。
在我们讨论这个问题的同时,其他延迟源可能包括检查点障碍对齐和垃圾回收。
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
将禁用障碍对齐,以完全放弃一次处理语义为代价。
使用RocksDB状态后端将减少要进行垃圾收集的对象的数量(因为它将状态保留在堆中),从而改善了最坏情况的延迟。
还
env.getConfig().enableObjectReuse();
将指示运行时重用用户对象以获得更好的性能。请记住,当用户代码功能不了解此行为时,这可能会导致错误。
如果您有兴趣测量延迟,请查看Latency Tracking和Monitoring Apache Flink Applications 101中有关延迟的部分。