试图创建一些警报系统机制,我希望找到两个窗口之间平均值的下降。
我很高兴找到TrafficRoutes示例,特别是当我看到它时:
A'放缓'如果滑动窗口中的绝对多数速度发生 小于上一个窗口的读数。
我查看了code,但未能理解为什么这意味着我们从上一个窗口获得了前一个值。由于我到目前为止没有滑动窗户的经验,我想我可能会遗漏一些东西。
实现这种机制,无论是否有滑动窗口 - 都不会像以前那样从以前的窗口获取数据。
知道我错过了什么? 是否有某种方法可以从上一个窗口获取值?
我正在使用SDK 1.9.0在GCP Dataflow上执行。
请指教,
Shushu
答案 0 :(得分:2)
我的假设:
Double
。PCollection<KV<String, Double>>
,其中String
是指标ID,Double
是指标值,每个元素都有适当的隐式时间戳(如果没有你可以使用WithTimestamps
转换来分配一个。你可以这样做:
PCollection<KV<String, Double>> metricValues = ...;
// Collection of (metric, timestamped 5-minute average)
// windowed into the same 5-minute windows as the input,
// where timestamp is assigned as the beginning of the window.
PCollection<KV<String, TimestampedValue<Double>>>
metricSlidingAverages = metricValues
.apply(Window.<KV<String, Double>>into(
SlidingWindows.of(Duration.standardMinutes(5))
.every(Duration.standardMinutes(1))))
.apply(Mean.<String, Double>perKey())
.apply(ParDo.of(new ReifyWindowFn()));
// Rewindow the previous collection into global window so we can
// do cross-window comparisons.
// For each metric, an unsorted list of (timestamp, average) pairs.
PCollection<KV<String, Iterable<TimestampedValue<Double>>>
metricAverageSequences = metricSlidingAverages
.apply(Window.<KV<String, TimestampedValue<Double>>>into(
new GlobalWindows()))
// We need to group the data by key again since the grouping key
// has changed (remember, GBK implicitly groups by key and window)
.apply(GroupByKey.<String, TimestampedValue<Double>>create())
metricAverageSequences.apply(new DetectAnomaliesFn());
...
class ReifyWindowFn extends DoFn<
KV<String, Double>, KV<String, TimestampedValue<Double>>> {
@ProcessElement
public void process(ProcessContext c, BoundedWindow w) {
// This DoFn makes the implicit window of the element be explicit
// and extracts the starting timestamp of the window.
c.output(KV.of(
c.element().getKey(),
TimestampedValue.of(c.element.getValue(), w.minTimestamp())));
}
}
class DetectAnomaliesFn extends DoFn<
KV<String, Iterable<TimestampedValue<Double>>>, Void> {
@ProcessElement
public void process(ProcessContext c) {
String metricId = c.element().getKey();
// Sort the (timestamp, average) pairs by timestamp.
List<TimestampedValue<Double>> averages = Ordering.natural()
.onResultOf(TimestampedValue::getTimestamp)
.sortedCopy(c.element().getValue());
// Scan for anomalies.
for (int i = 1; i < averages.size(); ++i) {
if (averages.get(i).getValue() < averages.get(i-1).getValue()) {
// Detected anomaly! Could do something with it,
// e.g. publish to a third-party system or emit into
// a PCollection.
}
}
}
}
请注意,我没有测试此代码,但它应该为您提供足够的概念性指导以完成任务。