分组间隙填充在Postgresql / Timescaledb中

时间:2018-08-04 09:52:55

标签: sql postgresql time-series

我有来自不同设备的测量数据,例如 Device_A Device_B 。我为每个设备测量温度和湿度。有时会缺少某些或所有度量: +---------------------+-------------+-------------+-------+ | ts | device_type | measurement | value | +---------------------+-------------+-------------+-------+ | 2018-04-30 23:59:59 | Device_A | Temperature | 10.1 | | 2018-04-30 23:59:59 | Device_A | Humidity | 66 | | 2018-04-30 23:59:59 | Device_B | Temperature | 19.1 | | 2018-05-03 23:59:59 | Device_A | Temperature | 12.1 | | 2018-05-03 23:59:59 | Device_B | Humidity | 67 | | 2018-05-03 23:59:59 | Device_B | Temperature | 16.1 | | 2018-05-04 23:59:59 | Device_A | Temperature | 17 | | 2018-05-04 23:59:59 | Device_A | Humidity | 63 | | 2018-05-04 23:59:59 | Device_B | Temperature | 12.1 | | 2018-05-04 23:59:59 | Device_B | Humidity | 73 | +---------------------+-------------+-------------+-------+

我想获取每天的平均温度和湿度,没有数据时,我希望将其设置为0(或任何其他任意值)-有趣的观点是在2018-05-01和2018-05-02 +---------------------+-------------+-------+ | date | measurement | mean | +---------------------+-------------+-------+ | 2018-04-30 23:59:59 | Humidity | 66 | | 2018-04-30 23:59:59 | Temperature | 14.6 | | 2018-05-01 23:59:59 | Temperature | 0 | | 2018-05-01 23:59:59 | Humidity | 0 | | 2018-05-02 23:59:59 | Temperature | 0 | | 2018-05-02 23:59:59 | Humidity | 0 | | 2018-05-03 23:59:59 | Humidity | 67 | | 2018-05-03 23:59:59 | Temperature | 14.1 | | 2018-05-04 23:59:59 | Humidity | 68 | | 2018-05-04 23:59:59 | Temperature | 14.55 | +---------------------+-------------+-------+

我尝试使用here描述的间隙填充,但是在测量列中被NULL值卡住了。另外,我每天只得到一行,而使用NULL测量则根本没有任何值。理想情况下,我希望每天获得2行-一条带有温度,另一条带有湿度,且两者的值都设置为0。

有没有什么方法可以生成上述输出?我知道将数据从“长”格式转换为“宽”格式可以解决我的问题,但是想知道是否还有其他解决方案?

我的代码:

CREATE SCHEMA tmp ;
SET search_path = tmp;

DROP TABLE IF EXISTS sample_data CASCADE;
CREATE TABLE sample_data (
  "ts" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  "device_type" character varying,
  "measurement" character varying,
  "value" DOUBLE PRECISION
);

INSERT INTO sample_data(ts, device_type, measurement, value) VALUES
('2018-04-30 23:59:59', 'Device_A', 'Temperature', 10.1),
('2018-04-30 23:59:59', 'Device_A', 'Humidity', 66.0),
('2018-04-30 23:59:59', 'Device_B', 'Temperature', 19.1),
('2018-05-03 23:59:59', 'Device_A', 'Temperature', 12.1),
('2018-05-03 23:59:59', 'Device_B', 'Humidity', 67.0),
('2018-05-03 23:59:59', 'Device_B', 'Temperature', 16.1),
('2018-05-04 23:59:59', 'Device_A', 'Temperature', 17.0),
('2018-05-04 23:59:59', 'Device_A', 'Humidity', 63.0),
('2018-05-04 23:59:59', 'Device_B', 'Temperature', 12.1),
('2018-05-04 23:59:59', 'Device_B', 'Humidity', 73.0)
;

WITH period AS (
  SELECT date
  FROM generate_series('2018-04-30 23:59:59'::timestamp, 
  '2018-05-04 23:59:59', interval '1 day') date
),
sample AS ( SELECT * FROM sample_data)

SELECT period.date,
      measurement,
      coalesce(sum(sample.value), 0) AS value
FROM period
LEFT JOIN sample ON period.date = sample.ts
GROUP BY
    period.date,
    sample.measurement
ORDER BY period.date,
        sample.measurement
;

输出: +---------------------+-------------+-------+ | date | measurement | mean | +---------------------+-------------+-------+ | 2018-04-30 23:59:59 | Humidity | 66 | | 2018-04-30 23:59:59 | Temperature | 14.6 | | 2018-05-01 23:59:59 | NULL | 0 | | 2018-05-02 23:59:59 | NULL | 0 | | 2018-05-03 23:59:59 | Humidity | 67 | | 2018-05-03 23:59:59 | Temperature | 14.1 | | 2018-05-04 23:59:59 | Humidity | 68 | | 2018-05-04 23:59:59 | Temperature | 14.55 | +---------------------+-------------+-------+

1 个答案:

答案 0 :(得分:0)

刚刚找到答案-周期表还必须包含度量值:

WITH period AS (
  SELECT date, m.measurement
  FROM generate_series('2018-04-30 23:59:59'::timestamp, '2018-05-04 23:59:59', interval '1 day') date
  NATURAL JOIN
    (SELECT DISTINCT measurement FROM sample_data) m
)

SELECT period.date,
      period.measurement,
      coalesce(sum(sample_data.value), 0) AS value
FROM period
LEFT JOIN sample_data ON period.date = sample_data.ts AND period.measurement = sample_data.measurement
GROUP BY
    period.date,
    period.measurement
ORDER BY 
    period.date,
    period.measurement
;