在DataFlow中获取以前的窗口数据

时间:2017-07-12 21:25:52

标签: java google-cloud-dataflow dataflow sliding-window

试图创建一些警报系统机制,我希望找到两个窗口之间平均值的下降。

我很高兴找到TrafficRoutes示例,特别是当我看到它时:

  

A'放缓'如果滑动窗口中的绝对多数速度发生   小于上一个窗口的读数

我查看了code,但未能理解为什么这意味着我们从上一个窗口获得了前一个值。由于我到目前为止没有滑动窗户的经验,我想我可能会遗漏一些东西。

实现这种机制,无论是否有滑动窗口 - 都不会像以前那样从以前的窗口获取数据。

知道我错过了什么? 是否有某种方法可以从上一个窗口获取值?

我正在使用SDK 1.9.0在GCP Dataflow上执行。

请指教,

Shushu

1 个答案:

答案 0 :(得分:2)

我的假设:

  • 您的警报系统将数据划分为“指标”标识的“指标”。
  • 指定时间内指标的值为Double
  • 您收到的指标数据为PCollection<KV<String, Double>>,其中String是指标ID,Double是指标值,每个元素都有适当的隐式时间戳(如果没有你可以使用WithTimestamps转换来分配一个。
  • 您希望从每1分钟开始计算每个5分钟间隔的每个度量标准的滑动平均值,并且想要做一些事情,以防从T + 1min开始的间隔的平均值小于从T <1开始的间隔的平均值/ LI>

你可以这样做:

PCollection<KV<String, Double>> metricValues = ...;
// Collection of (metric, timestamped 5-minute average)
// windowed into the same 5-minute windows as the input,
// where timestamp is assigned as the beginning of the window.
PCollection<KV<String, TimestampedValue<Double>>>
  metricSlidingAverages = metricValues
    .apply(Window.<KV<String, Double>>into(
        SlidingWindows.of(Duration.standardMinutes(5))
                      .every(Duration.standardMinutes(1))))
    .apply(Mean.<String, Double>perKey())
    .apply(ParDo.of(new ReifyWindowFn()));

// Rewindow the previous collection into global window so we can
// do cross-window comparisons.
// For each metric, an unsorted list of (timestamp, average) pairs.
PCollection<KV<String, Iterable<TimestampedValue<Double>>>
  metricAverageSequences = metricSlidingAverages
    .apply(Window.<KV<String, TimestampedValue<Double>>>into(
        new GlobalWindows()))
    // We need to group the data by key again since the grouping key
    // has changed (remember, GBK implicitly groups by key and window)
    .apply(GroupByKey.<String, TimestampedValue<Double>>create())

metricAverageSequences.apply(new DetectAnomaliesFn());

...

class ReifyWindowFn extends DoFn<
    KV<String, Double>, KV<String, TimestampedValue<Double>>> {
  @ProcessElement
  public void process(ProcessContext c, BoundedWindow w) {
    // This DoFn makes the implicit window of the element be explicit
    // and extracts the starting timestamp of the window.
    c.output(KV.of(
      c.element().getKey(),
      TimestampedValue.of(c.element.getValue(), w.minTimestamp())));
  }
}

class DetectAnomaliesFn extends DoFn<
    KV<String, Iterable<TimestampedValue<Double>>>, Void> {
  @ProcessElement
  public void process(ProcessContext c) {
    String metricId = c.element().getKey();
    // Sort the (timestamp, average) pairs by timestamp.
    List<TimestampedValue<Double>> averages = Ordering.natural()
        .onResultOf(TimestampedValue::getTimestamp)
        .sortedCopy(c.element().getValue());
    // Scan for anomalies.
    for (int i = 1; i < averages.size(); ++i) {
      if (averages.get(i).getValue() < averages.get(i-1).getValue()) {
        // Detected anomaly! Could do something with it,
        // e.g. publish to a third-party system or emit into
        // a PCollection.
      }
    }
  }
}

请注意,我没有测试此代码,但它应该为您提供足够的概念性指导以完成任务。