我试图在KSQL中实现卷平均。
Kafka当前将生产者的数据提取到主题“ KLINES”中。这些数据在多个市场上使用一致的格式。然后,我像这样从这些数据创建一个流:
CREATE STREAM KLINESTREAM (market VARCHAR, open DOUBLE, high DOUBLE, low DOUBLE, close DOUBLE, volume DOUBLE, start_time BIGINT, close_time BIGINT, event_time BIGINT) \
WITH (VALUE_FORMAT='JSON', KAFKA_TOPIC='KLINES', TIMESTAMP='event_time', KEY='market');
然后我创建一个表格,计算出每个市场最近20分钟的平均交易量,如下所示:
CREATE TABLE AVERAGE_VOLUME_TABLE_BY_MARKET AS \
SELECT CEIL(SUM(volume) / COUNT(*)) AS volume_avg, market FROM KLINESTREAM \
WINDOW HOPPING (SIZE 20 MINUTES, ADVANCE BY 5 SECONDS) \
GROUP BY market;
SELECT * FROM AVERAGE_VOLUME_TABLE_BY_MARKET LIMIT 1;
为清楚起见,产生:
1560647412620 | EXAMPLEMARKET : Window{start=1560647410000 end=-} | 44.0 | EXAMPLEMARKET
我希望拥有一个KSQL表,该表将代表每个市场的当前“ kline”状态,同时还包括在“ AVERAGE_VOLUME_TABLE_BY_MARKET” KTable中计算的滚动平均交易量,因此我可以在当前交易量和平均滚动量
我试图这样加入:
SELECT K.market, K.open, K.high, K.low, K.close, K.volume, V.volume_avg \
FROM KLINESTREAM K \
LEFT JOIN AVERAGE_VOLUME_TABLE_BY_MARKET V \
ON K.market = V.market;
但是很明显,这会导致错误,因为“ AVERAGE_VOLUME_TABLE_BY_MARKET”键既包含TimeWindow,也包含市场。
A serializer (key:
org.apache.kafka.streams.kstream.TimeWindowedSerializer) is not compatible to
the actual key type (key type: java.lang.String). Change the default Serdes in
StreamConfig or provide correct Serdes via method parameters.
我正确地解决了这个问题吗?
我想要实现的是:
Windowed Aggregate KTable + Kline Stream ->
KTable representing current market state
including average volume from the KTable
显示KSQL中可能的当前市场状态。还是必须使用KStreams或其他库来完成此任务?
以下是一个很好的汇总示例:https://www.confluent.io/stream-processing-cookbook/ksql-recipes/aggregating-data
适用于此示例,当聚合数据到达KSQL表时,我将如何使用该聚合与之进行比较?
干杯,詹姆斯
答案 0 :(得分:0)
我相信您要寻找的可能是LATEST_BY_OFFSET:
CREATE TABLE AVERAGE_VOLUME_TABLE_BY_MARKET AS
SELECT
market,
LATEST_BY_OFFSET(volume) AS volume,
CEIL(SUM(volume) / COUNT(*)) AS volume_avg
FROM KLINESTREAM
WINDOW HOPPING (SIZE 20 MINUTES, ADVANCE BY 5 SECONDS)
GROUP BY market;