获取分组的第一个和最后一个时间戳之间的另一个字段的差异

时间:2013-12-13 11:28:36

标签: sql database postgresql

我有一个名为sensor_values的非常大的表,其中包含timestampvaluesensor_id列以及另一个名为sensors的表sensor_id },name

我经常执行一个数据透视查询来获取按天分组的总计数据,如下所示:

SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,
      SUM(CASE WHEN s1.sensor_id = 572 THEN s1.value ELSE 0.0 END) AS "Nickname1",
      SUM(CASE WHEN s1.sensor_id = 542 THEN s1.value ELSE 0.0 END) AS "Nickname2",
      SUM(CASE WHEN s1.sensor_id = 571 THEN s1.value ELSE 0.0 END) AS "Nickname3"
FROM sensor_values s1
WHERE s1.timestamp::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz
AND s1.timestamp::timestamptz <= '2013-10-18T00:00:00+00:00'::timestamptz
AND s1.sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP BY date_trunc('day', s1.timestamp) ORDER BY 1 ; 

如果有点慢,这可以正常工作。但是,是否可以编写类似的查询 它不是对各组进行求和,而是得到每个分组中最新和最早时间戳之间的差异,即本例中的一天?

这是因为我有一些不断增加的传感器数据(电气千瓦时) 并想知道特定时间段内的消费情况。

2 个答案:

答案 0 :(得分:2)

步骤1:松开手刹

  

......如果有点慢

SELECT to_char(MIN(ts)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
      ,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END) AS nickname1
      ,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END) AS nickname2
      ,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END) AS nickname3
FROM   sensor_values
-- LEFT JOIN sensor_values_cleaned s2 USING (sensor_id, ts)
WHERE  ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
AND    ts <  '2013-10-18T00:00:00+00:00'::timestamptz::timestamp
AND    sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP  BY ts::date AS day
ORDER  BY 1;

重点

  • 在标识符中替换reserved words(在标准SQL中) timestamp - &gt; ts
    time - &gt; min_time

  • 由于连接位于相同的列名称上,因此您可以在连接条件中使用更简单的USING clauseUSING (sensor_id, ts)
    但是,由于第二个表sensor_values_cleaned与此查询100%无关,因此我将其完全删除。

  • 正如@joop已经建议的那样,在第一个输出列中切换min()to_char()。这样,Postgres可以从原始列值中确定最小值,这通常更快并且可以使用索引。在这种特定情况下,按date 排序也比text排序便宜,后者还必须考虑整理规则。

  • 类似的考虑适用于您的WHERE条件:
    WHERE ts :: timestamptz&gt; ='2013-10-14T00:00:00 + 00:00':: timestamptz

    WHERE  ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
    

    第二个是sargable并且可以使用ts上的普通索引 - 对大表中的效果产生很大影响!

  • 使用ts::date代替date_trunc('day', ts)。更简单,更快速,相同的结果。

  • 很可能你的第二个WHERE条件稍有不正确。通常,您会排除上边框
    <击>

    AND    ts <=  '2013-10-18T00:00:00+00:00' ...

    AND    ts <   '2013-10-18T00:00:00+00:00' ...
  • 混合timestamptimestamptz时,需要注意效果。例如,您的WHERE条件不会在当地时间00:00切换(除非本地时间与UTC一致)。详情:
    Ignoring timezones altogether in Rails and PostgreSQL

第2步:您的请求

  

...每个分组中最新和最早时间戳之间的差异

我认为你的意思是: ... 的值与最新和最早的时间戳之间的差异...
否则会更简单。

使用window functions,特别是first_value()last_value()。仔细考虑组合,在这种情况下,您需要non-standard window frame作为last_value()。比较:
PostgreSQL aggregate or window function to return just the last value

我将它与DISTINCT ON结合起来,在这种情况下比GROUP BY(需要另一个子查询级别)更方便:

SELECT DISTINCT ON (ts::date, sensor_id)
       ts::date AS day
      ,to_char((min(ts)  OVER (PARTITION BY ts::date))::timestamptz
              ,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
      ,sensor_id
      ,last_value(value)    OVER (PARTITION BY ts::date, sensor_id ORDER BY ts
                     RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
       - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts)
                                                                   AS val_range
FROM   sensor_values
WHERE  ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
AND    ts <  '2013-10-18T00:00:00+0'::timestamptz::timestamp
AND    sensor_id IN (540, 541, 542, 571, 572, 573)
ORDER  BY ts::date, sensor_id;

-> SQLfiddle demo.

第3步:数据透视表

在上述查询的基础上,我使用了附加模块tablefunc中的crosstab()

SELECT * FROM crosstab(
   $$SELECT DISTINCT ON (1,3)
            ts::date AS day
           ,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
           ,sensor_id
           ,last_value(value)    OVER (PARTITION BY ts::date, sensor_id ORDER BY ts RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
            - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts) AS val_range
     FROM   sensor_values
     WHERE  ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
     AND    ts <  '2013-10-18T00:00:00+0'::timestamptz::timestamp
     AND    sensor_id IN (540, 541, 542, 571, 572, 573)
     ORDER  BY 1, 3$$

   ,$$VALUES (540), (541), (542), (571), (572), (573)$$
   )
AS ct (day date, min_time text, s540 numeric, s541 numeric, s542 numeric, s571 numeric, s572 numeric, s573 numeric);

返回(并且比以前更快):

    day     |         min_time         | s540  | s541  | s542  | s571  | s572  | s573
------------+--------------------------+-------+-------+-------+-------+-------+-------
 2013-10-14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27
 2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23
 2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98
 2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94

答案 1 :(得分:0)

尝试替换

SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,

由:

SELECT to_char(MIN(s1.timestamp)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS zztime,

甚至:

SELECT MIN(s1.timestamp) AS zztime,

,因为您指定的datetimestampformat或多或少是默认

这将避免计算表达式的最小选择。

BTW:timestamptime都是(postgres)SQL中的保留字(类型名称)。尽量避免将它们用作标识符。