我想在Flink SQL中使用TUMBLE(time_attr,interval)窗口函数,但我不知道如何根据我的数据设置“ time_atttr”。
下面是我的kafka源代码的一行,它是json格式,主体字段包含用户日志:
{
body: [
"user1,url1,2018-10-23 00:00:00;user2,url2,2018-10-23 00:01:00;user3,url3,2018-10-23 00:02:00"
]}
我使用LATERAL TABLE和用户定义的TableFunction将源平面映射到新表log
,我想按时间和用户名分组,这是我的代码:
public class BodySplitFun extends TableFunction<Tuple3<String, String, Long>> {
private SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
public void eval(Row bodyRow) {
String body = bodyRow.getField(0).toString();
String[] lines = body.split(";");
for (String line : lines) {
String user = line.split(",")[0];
String url = line.split(",")[1];
String sTime = line.split(",")[2];
collect(new Tuple3<>(user, url, sdf.parse(sTime).getTime());
}
}
}
}
tblEnv.registerFunction("bodySplit", new BodySplitFun());
tblEnv.sqlUpdate(
"select
count(username)
from
(
SELECT
username,
url,
sTime
FROM
mySource LEFT JOIN LATERAL TABLE(bodySplit(body)) as T(username, url, sTime) ON TRUE
)
log
group by
TUMBLE(log.sTime, INTERVAL '1' MINUTE), log.username");
运行程序时,出现以下错误消息:
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply 'TUMBLE' to arguments of type 'TUMBLE(<BIGINT>, <INTERVAL DAY>)'. Supported form(s): 'TUMBLE(<DATETIME>, <DATETIME_INTERVAL>)'
'TUMBLE(<DATETIME>, <DATETIME_INTERVAL>, <TIME>)'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:572)
... 49 more
如何将表log
的sTime字段用于分组操作?