在Flink的聚合原语中具有与HOP_START等效的属性

时间:2019-02-13 11:16:23

标签: apache-flink flink-streaming windowing apache-calcite flink-sql

我正在尝试在Flink SQL中的跳跃窗口上做指数衰减的移动平均线。我需要访问以下窗口的边框之一:HOP_START:

SELECT                                                                              
  lb_index one_key,
-- I have access to this one:
  HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND) start_time,
-- Aggregation primitive:
  SUM(
    Y * EXP(TIMESTAMPDIFF(
      SECOND, 
      proctime, 
-- This one throws:
      HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND)
  )))
FROM write_position                                                                
GROUP BY lb_index, HOP(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND)

我正在获取以下堆栈跟踪:

11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using Aggregate(groupBy: (lb_index), window: (SlidingGroupWindow('w$, 'proctime, 5000.millis, 50.millis)), select: (lb_index, SUM($f2) AS Y, start('w$) AS w$start, end('w$) AS w$end, proctime('w$) AS w$proctime))
11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using Calc(select: (lb_index, proctime, *(payload.Y, EXP(/(CAST(/INT(Reinterpret(-(HOP_START(PROCTIME(proctime), 50, 5000), PROCTIME(proctime))), 1000)), 1000))) AS $f2))
11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using rel#459:DataStreamScan.DATASTREAM.true.Acc(table=[_DataStreamTable_0])
Exception in thread "main" org.apache.flink.table.codegen.CodeGenException: Unsupported call: HOP_START 
If you think this function should be supported, you can create an issue and start a discussion for it.
    at org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
    at org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:1027)
    at org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:66)

它确实表示在聚合SUM之外运行时未实现。这就是让我觉得这是一个范围界定问题的原因。

现在,事情是:我可以转换此表达式并在聚合之外进行最终处理,因为exp(x + y)= exp(x)* exp(y);但是我一直坚持使用TIMESTAMPDIFF(which did wonders in my previous issue)。我还没有找到将TIME ATTRIBUTE转换为NUMERIC类型的方法。而且,即使我按比例缩小了UNIX时间戳,我也不满意。

无论如何,这种解决方法有点笨拙,也许我还有另一种方法。我不知道如何在此SQL片段中调整作用域以使其仍然“存在”于窗口作用域中,并具有开始时间而不会抛出异常。

1 个答案:

答案 0 :(得分:0)

我建议您尝试使用HOP_PROCTIME()而不是HOP_START()。 here解释了这些差异,但是结果是您将拥有proctime属性而不是时​​间戳,我希望这会使TIMESTAMPDIFF变得满意。