我可以按小时查询Clickhouse中累积列的增量吗?

时间:2019-12-13 01:25:06

标签: clickhouse

我想保存事件时间和每30秒产生的电量总量。总金额不会每次都重置为零。这只是从仪表开始到现在的总和,而不是30秒内产生的总和。

有什么方法可以查询每天,每周或每月产生的电柱总数的汇总(也许不只是求和或平均值)?

还是通过设计AggregatingMergeTree表?

我不需要保留每条记录,只需要每天,每周和每月的汇总即可。

例如:

create table meter_record (
   event_time Datetime,
   generated_total Int64
)

1 个答案:

答案 0 :(得分:2)

让我们建议您为此表计算中位数平均值分散聚合:

CREATE TABLE meter_record (
   event_time Datetime,
   generated_total Int64  
)
ENGINE = MergeTree
PARTITION BY (toYYYYMM(event_time))
ORDER BY (event_time);

使用AggregatingMergeTree计算所需的总计:

CREATE MATERIALIZED VIEW meter_aggregates_mv
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (day) 
AS
SELECT  
  toDate(toStartOfDay(event_time)) AS day, 
  /* aggregates to calculate the day's section left and right endpoints */
  minState(generated_total) min_generated_total,
  maxState(generated_total) max_generated_total,
  /* specific aggregates */
  medianState(generated_total) AS totalMedian,
  avgState(generated_total) AS totalAvg,
  varPopState(generated_total) AS totalDispersion
  /* ... */
FROM meter_record
GROUP BY day;

要获取每日/每周/每月(以及任何以天为基础的汇总,例如每季每年)的汇总,请使用以下查询:

/* daily report */
SELECT 
  day,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE day >= '2019-02-05' and day < '2019-07-01'*/
GROUP BY day;

/* weekly report */
SELECT 
  toStartOfWeek(day, 1) monday,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE day >= '2019-02-05' and day < '2019-07-01'*/
GROUP BY monday;

/* monthly report */
SELECT 
  toStartOfMonth(day) month,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE day >= '2019-02-05' and day < '2019-07-01'*/
GROUP BY month;

/* get daily / weekly / monthly reports in one query (thanks @Denis Zhuravlev for advise) */
SELECT
  day,
  toStartOfWeek(day, 1) AS week,
  toStartOfMonth(day) AS month,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE (day >= '2019-05-01') AND (day < '2019-06-01')*/
GROUP BY month, week, day WITH ROLLUP
ORDER BY day, week, month;

备注:

  • 您指出原始数据并不需要原始数据,因此您可以将 meter_record -table的引擎设置为Null,手动清理 meter_record < / em>(请参阅DROP PARTITION)或定义TTL来自动执行

  • 删除原始数据是一种不好的做法,因为它使得无法基于历史数据计算新的聚合或还原现有的聚合等

  • 物化视图 meter_aggregates_mv 将仅包含创建视图后插入表 meter_record 中的数据。要更改此行为,请在视图定义中使用POPULATE