输入: Clickhouse
表A business_dttm(日期时间) 金额(浮动)
我需要在每个business_dttm上计算15分钟(或最近3条记录)的移动总和
例如
amount business_dttm moving sum
0.3 2018-11-19 13:00:00
0.3 2018-11-19 13:05:00
0.4 2018-11-19 13:10:00 1
0.5 2018-11-19 13:15:00 1.2
0.6 2018-11-19 13:15:00 1.5
0.7 2018-11-19 13:20:00 1.8
0.8 2018-11-19 13:25:00 2.1
0.9 2018-11-19 13:25:00 2.4
0.5 2018-11-19 13:30:00 2.2
不幸的是,我们没有窗口功能,并且在Clickhouse中没有相同条件的情况下加入
在没有交叉连接的情况下如何做?
答案 0 :(得分:4)
如果窗口大小非常小,则可以执行以下操作
mvn dependency:tree
由于ClickHouse不会将聚合折叠为null,因此将忽略前两行。您可以稍后再添加。
仍然可以计算任意窗口大小的移动总和。根据需要调整SELECT
sum(window.2) AS amount,
max(dttm) AS business_dttm,
sum(amt) AS moving_sum
FROM
(
SELECT
arrayJoin([(rowNumberInAllBlocks(), amount), (rowNumberInAllBlocks() + 1, 0), (rowNumberInAllBlocks() + 2, 0)]) AS window,
amount AS amt,
business_dttm AS dttm
FROM
(
SELECT
amount,
business_dttm
FROM A
ORDER BY business_dttm
)
)
GROUP BY window.1
HAVING count() = 3
ORDER BY window.1;
(此示例为3)。
window_size
或者这个变体
-- Note, rowNumberInAllBlocks is incorrect if declared inside with block due to being stateful
WITH
(
SELECT arrayCumSum(groupArray(amount))
FROM
(
SELECT
amount
FROM A
ORDER BY business_dttm
)
) AS arr,
3 AS window_size
SELECT
amount,
business_dttm,
if(rowNumberInAllBlocks() + 1 < window_size, NULL, arr[rowNumberInAllBlocks() + 1] - arr[rowNumberInAllBlocks() + 1 - window_size]) AS moving_sum
FROM
(
SELECT
amount,
business_dttm
FROM A
ORDER BY business_dttm
)
一般警告,这两种方法都远非最佳,但它展示了ClickHouse超越SQL的独特功能。