我有一个kafka主题,接收以下事件:{timestamp, word, channel_id}
。
我需要创建一个KSQL,以获取在过去半小时内在确定的频道中所说的前K个单词。
到目前为止,我所做的是:
1-为主题创建频道
CREATE STREAM WORDEVENTS WITH (KAFKA_TOPIC='words',VALUE_FORMAT='AVRO');
2-过滤我想要的频道
CREATE STREAM FILTERED_WORDEVENTS WITH (KAFKA_TOPIC='words_in_mail', VALUE_FORMAT='AVRO') AS SELECT WORD FROM WORDEVENTS WHERE CHANNEL_ID LIKE 'mail';
还有一些我不知道的地方,我可以这样做:
SELECT WORD, COUNT(*) AS COUNT_TOTAL FROM FILTERED_WORDEVENTS WINDOW HOPPING (SIZE 30 MINUTES, ADVANCE BY 5 SECONDS) GROUP BY WORD;
这很好用,但是如果我尝试使用TOPK函数做某事,则不起作用:
SELECT WORD, topk(COUNT(*), 2) AS COUNT_TOTAL FROM FILTERED_WORDEVENTS WINDOW HOPPING (SIZE 30 MINUTES, ADVANCE BY 5 SECONDS) GROUP BY WORD;
它失败并显示:
Caused by: Can't find any functions with the name 'COUNT'
我尝试通过事件为该组创建一个流/表,然后尝试使计数增加:
CREATE TABLE COUNT_WORDS_LAST_HOUR AS SELECT WORD, COUNT(*) AS COUNT_TOTAL FROM FILTERED_WORDEVENTS WINDOW HOPPING (SIZE 30 MINUTES, ADVANCE BY 5 SECONDS) GROUP BY WORD;
但是它抱怨说topK可以应用于表
如何解决这个用例?