Clickhouse移动平均线

时间:2018-11-21 16:29:16

标签: moving-average clickhouse

输入: Clickhouse

表A business_dttm(日期时间) 金额(浮动)

我需要在每个business_dttm上计算15分钟(或最近3条记录)的移动总和

例如

amount business_dttm     moving sum
0.3 2018-11-19 13:00:00  
0.3 2018-11-19 13:05:00
0.4 2018-11-19 13:10:00  1
0.5 2018-11-19 13:15:00  1.2
0.6 2018-11-19 13:15:00  1.5
0.7 2018-11-19 13:20:00  1.8
0.8 2018-11-19 13:25:00  2.1
0.9 2018-11-19 13:25:00  2.4
0.5 2018-11-19 13:30:00  2.2

不幸的是,我们没有窗口功能,并且在Clickhouse中没有相同条件的情况下加入

在没有交叉连接的情况下如何做?

1 个答案:

答案 0 :(得分:4)

如果窗口大小非常小,则可以执行以下操作

mvn dependency:tree

由于ClickHouse不会将聚合折叠为null,因此将忽略前两行。您可以稍后再添加。

更新:

仍然可以计算任意窗口大小的移动总和。根据需要调整SELECT sum(window.2) AS amount, max(dttm) AS business_dttm, sum(amt) AS moving_sum FROM ( SELECT arrayJoin([(rowNumberInAllBlocks(), amount), (rowNumberInAllBlocks() + 1, 0), (rowNumberInAllBlocks() + 2, 0)]) AS window, amount AS amt, business_dttm AS dttm FROM ( SELECT amount, business_dttm FROM A ORDER BY business_dttm ) ) GROUP BY window.1 HAVING count() = 3 ORDER BY window.1; (此示例为3)。

window_size

或者这个变体

-- Note, rowNumberInAllBlocks is incorrect if declared inside with block due to being stateful
WITH
    (
        SELECT arrayCumSum(groupArray(amount))
        FROM
        (
            SELECT
                amount
            FROM A
            ORDER BY business_dttm
        )
    ) AS arr,
    3 AS window_size
SELECT
    amount,
    business_dttm,
    if(rowNumberInAllBlocks() + 1 < window_size, NULL, arr[rowNumberInAllBlocks() + 1] - arr[rowNumberInAllBlocks() + 1 - window_size]) AS moving_sum
FROM
(
    SELECT
        amount,
        business_dttm
    FROM A
    ORDER BY business_dttm
)

一般警告,这两种方法都远非最佳,但它展示了ClickHouse超越SQL的独特功能。