我试图对不同窗口大小的流中的数据进行计数(窗口的大小在Steam数据中),所以我使用自定义WindowAssigner和AggregateFunction,但是状态很大(窗口范围从一小时到30天)>
在我看来,聚合状态仅存储中间结果
有什么问题吗?
public class ElementProcessingTime extends WindowAssigner<Element, TimeWindow> {
@Override public Collection<TimeWindow> assignWindows(Element element, long timestamp, WindowAssignerContext context) {
long slide = Time.seconds(10).toMilliseconds();
long size = element.getTime() * 60 * 1000;
timestamp = context.getCurrentProcessingTime();
List<TimeWindow> windows = new ArrayList<>((int) (size / slide));
long lastStart = TimeWindow.getWindowStartWithOffset(timestamp, 0, slide);
for (long start = lastStart; start > timestamp - size; start -= slide) {
windows.add(new TimeWindow(start, start + size));
}
return windows;
}
@Override public Trigger<FactorCalDetail, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return ElementTimeTrigger.create();
}
@Override public TypeSerializer<TimeWindow> getWindowSerializer(ExecutionConfig executionConfig) {
return new TimeWindow.Serializer();
}
@Override public boolean isEventTime() {
return false;
}
}
public class CountAggregate implements AggregateFunction<FactorCalDetail, AggregateResult, AggregateResult> {
@Override public AggregateResult createAccumulator() {
AggregateResult result = new AggregateResult();
result.setResult(0.0);
return result;
}
@Override public AggregateResult add(FactorCalDetail value, AggregateResult accumulator) {
accumulator.setKey(value.getGroupKey());
accumulator.addResult();
accumulator.setTimeSpan(value.getTimeSpan());
return accumulator;
}
@Override public AggregateResult getResult(AggregateResult accumulator) {
return accumulator;
}
@Override public AggregateResult merge(AggregateResult a, AggregateResult b) {
if (a.getKey().equals(b.getKey())) {
a.setResult(a.getResult() + b.getResult());
}
return a;
}
}
env.addSource(source)
.keyBy(Element::getKey)
.window(new ElementProcessingTime())
.aggregate(new CountAggregate())
.addSink(new RedisCustomizeSink(redisProperties));
答案 0 :(得分:0)
您没有说什么是源,它将有自己的状态可以持久。您也不会说有多少个唯一键。随着唯一键数量的增加,每个键的状态即使很小,也会变得巨大。如果问题最终确实出现在聚合器状态的增长中,则可以尝试将开窗逻辑分为一系列的两个窗口,一个窗口汇总每小时,第二个窗口汇总每小时汇总到所需的时间范围。