如何在Apache Flink中使滑动窗口仅在达到窗口大小后才能滑动?

时间:2017-04-22 10:39:58

标签: apache-flink flink-streaming

私有DataStream buySideVolumeWMA(DataStream buyPressureTradeStream){

    Integer windowSize = 3;
    Integer windowslide = 1;

    DataStream<Double> buySideVolumeWMAStream = buyPressureTradeStream.countWindowAll(windowSize, windowslide)
            .apply(new AllWindowFunction<String, Double, GlobalWindow>() {

                @Override
                public void apply(GlobalWindow window, Iterable<String> values, Collector<Double> out)
                        throws Exception {
                    Double buySideVolumeWMA = 0.0;
                    Integer weight = windowSize;
                    Integer numerator = 1;

                    for (String tradeString : values) {
                        JSONObject json = new JSONObject(tradeString);
                        Double tradeVolume = (Double) json.get("Volume");
                        buySideVolumeWMA += ((tradeVolume * numerator) / weight);
                        slf4jLogger.info("tradeVolume " + tradeVolume + " , " + "numerator , " + numerator
                                + " weight , " + weight + " buySideVolumeWMA " + buySideVolumeWMA);
                        numerator++;

                    }
                    numerator = 1;

                    out.collect(buySideVolumeWMA / 2);
                    buySideVolumePressure = buySideVolumeWMA / 2;
                    // slf4jLogger.info("buySideVolumePressure :" +
                    // buySideVolumePressure);


    buySideVolumeWMAStream.print().setParallelism(5);

    return buySideVolumeWMAStream;

}

=============================================== =========================在这个程序中我使用的窗口大小为3,幻灯片大小为1.我希望它一旦收到就开始滑动计数3的流数据然后才开始滑动1.但是,当我接收到第一个数据时,我的程序立即开始滑动,然后为它收到的每个数据滑动。所以如何在接收到它之后使其滑动计数3的数据然后滑1?

2 个答案:

答案 0 :(得分:1)

您可以在窗口中添加偏移量。这是Window命令的第三个参数。这样你可以在我看来稍后开始。

文档中的示例:

// sliding processing-time windows offset by -8 hours
input
    .keyBy(<key selector>)
    .window(SlidingProcessingTimeWindows.of(Time.hours(12), Time.hours(1), Time.hours(-8)))
    .<windowed transformation>(<window function>);

要了解详情:https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html

答案 1 :(得分:0)

据我所知,截至2019年11月和Flink 1.9.1中,这不是滑动窗口的功能。我了解这是因为窗口对象是单独的,并且不共享任何状态。例如,如果使用键控流,则将每个窗口和键对窗口中的对象复制并存储一次。

以下过滤器保持足够的状态,以忽略其接收的前n条消息。如果使用键流(如.keyBy(...)中所示),则将为每个键保留一个单独的计数器,因为这是Flink管理ValueState对象的方式。

  /**
   * This filter suppresses the first n messages (inclusive of n). This behavior may be desired for use with sliding
   * windows when no output is desired until the full size of the window is reached.
   *
   * Example usage:
   * .filter(new SuppressFirstNFromSlidingWindow[(String, Int)](5))
   */
  class SuppressFirstNFromSlidingWindow[T](nToSuppress: Int) extends RichFilterFunction[T] {

    private var state_allowAll: ValueState[Boolean] = _
    private var state_numberSkipped: ValueState[Int] = _

    override def filter(value: T): Boolean = {

      if (state_allowAll.value()) return true

      val numberSkipped = state_numberSkipped.value()
      if (numberSkipped < nToSuppress) {
        state_numberSkipped.update(numberSkipped + 1)
        false
      } else {
        state_allowAll.update(true)
        true
      }
    }

    override def open(parameters: Configuration): Unit = {

      state_allowAll = getRuntimeContext.getState(
        new ValueStateDescriptor[Boolean]("allowAll", createTypeInformation[Boolean])
      )

      state_numberSkipped = getRuntimeContext.getState(
        new ValueStateDescriptor[Int]("numberSkipped", createTypeInformation[Int])
      )
    }
  }