在相同数据上闪烁多个Windows

时间:2019-01-30 12:11:06

标签: streaming apache-flink flink-streaming

我的flink应用程序执行以下操作

  1. 源:从卡夫卡读取记录形式的数据
  2. 分割:基于某些条件
  3. 窗口:10秒的时间窗口,可汇总到一个批量记录
  4. 接收器:将这些批量记录转储到elasticsearch

我遇到的问题是flink消费者无法保存数据10秒钟,并引发以下异常:

原因:java.util.concurrent.ExecutionException:java.io.IOException:状态的大小大于允许的最大内存支持状态。大小= 18340663,最大大小= 5242880

我不能应用countWindow,因为如果记录的频率太慢,那么弹性搜索接收器可能会延迟很长时间。

我的问题:

是否可以应用TimeWindow和CountWindow的或函数,其作用为

> if ( recordCount is 500 OR 10 seconds have elapsed)
>           then dump data to flink

2 个答案:

答案 0 :(得分:0)

不直接。但是,您可以将GlobalWindow与自定义触发逻辑一起使用。看看计数触发器here的来源。

您的触发逻辑将如下所示。

private final ReducingStateDescriptor<Long> stateDesc = 
    new ReducingStateDescriptor<>("count", new Sum(), LongSerializer.INSTANCE);
private long triggerTimestamp = 0;

@Override
public TriggerResult onElement(String element, long l, GlobalWindow globalWindow, TriggerContext triggerContext) throws Exception {

    ReducingState<Long> count = triggerContext.getPartitionedState(stateDesc);

    // Increment window counter by one, when an element is received
    count.add(1L); 

    // Start the timer when the first packet is received
    if (count.get() == 1) {
        triggerTimestamp = triggerContext.getCurrentProcessingTime() + 10000; // trigger at 10 seconds from reception of first event
        triggerContext.registerProcessingTimeTimer(triggerTimestamp); // Override the onProcessingTime method to trigger the window at this time
    }

    // Or trigger the window when the number of packets in the window reaches 500
    if (count.get() >= 500) {
        // Delete the timer, clear the count and fire the window   
        triggerContext.deleteProcessingTimeTimer(triggerTimestamp);
        count.clear();
        return TriggerResult.FIRE;
    }

    return TriggerResult.CONTINUE;
}

答案 1 :(得分:0)

您也可以使用RocksDB state backend,但是自定义触发器会更好。