具有窗口函数滞后的Postgres组结果返回0行

时间:2018-01-02 12:14:30

标签: sql postgresql postgresql-9.3

我正在尝试进行查询,我想忽略结果查询的第一行和最后一行。为了做到这一点,使用窗口函数给出了一个命中,就像上面的查询给我的那样

SELECT lag(timestamp_min)    OVER (ORDER BY timestamp_min) AS timestamp_min,
       lag(type)             OVER (ORDER BY timestamp_min) AS type,
       lag(sum_first_medium) OVER (ORDER BY timestamp_min),
FROM (SELECT to_timestamp(
                floor(
                   (extract('epoch' FROM TIMESTAMP) / 300)
                ) * 300
             ) AS timestamp_min,
             type,
             floor(sum(medium[1])) AS sum_first_medium
      FROM default_dataset
      WHERE type = 'ap_clients.wlan0'
        AND timestamp > current_timestamp - INTERVAL '85 minutes'
        AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
      GROUP BY timestamp_min, type) lagme
OFFSET 2;

问题是最后一个查询没有返回任何内容:

ws_controller_hist=> SELECT lag(timestamp_min) OVER (ORDER BY timestamp_min) AS timestamp_min, lag(type) OVER (ORDER BY timestamp_min) AS type, lag(sum_first_medium) OVER (ORDER BY timestamp_min) FROM (SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) AS timestamp_min, type, floor(sum(medium[1])) AS sum_first_medium FROM default_dataset WHERE type = 'ap_clients.wlan0' AND timestamp > current_timestamp - INTERVAL '85 minutes' AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638' GROUP BY timestamp_min, type) lagme OFFSET 2;
 timestamp_min | type | lag
---------------+------+-----
(0 rows)

但我有“ap_clients.wlan0”类型的数据

ws_controller_hist=> select * from default_dataset where type ='ap_clients.wlan0' order by timestamp desc limit 3;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
(3 rows)

我需要查询检索过去一小时内所有媒体的总和,分组为5分钟。

我的第一个解决我的问题的方法是忽略我使用offset(1)的第一条记录并忽略最后一条我试图在我的id字段中进行限制,按时间戳desc排序。

ws_controller_hist=>  
SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) 
AS timestamp_min,
       TYPE,
       floor(sum(medium[1]))
FROM default_dataset
WHERE TYPE LIKE 'ap_clients.wlan0'
  AND TIMESTAMP > CURRENT_TIMESTAMP - interval '85 minutes'
  AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
  AND id NOT IN
    (SELECT id
     FROM default_dataset
     ORDER BY TIMESTAMP DESC
     LIMIT 1)
GROUP BY timestamp_min,
         TYPE
ORDER BY timestamp_min ASC
OFFSET 1;

     timestamp_min      |       type       | floor
------------------------+------------------+-------
 2017-12-19 14:20:00+00 | ap_clients.wlan0 |    38
 2017-12-19 14:25:00+00 | ap_clients.wlan0 |    37
 2017-12-19 14:30:00+00 | ap_clients.wlan0 |    39
 2017-12-19 14:35:00+00 | ap_clients.wlan0 |    42
 2017-12-19 14:40:00+00 | ap_clients.wlan0 |    43
 2017-12-19 14:45:00+00 | ap_clients.wlan0 |    44
 2017-12-19 14:50:00+00 | ap_clients.wlan0 |    45
 2017-12-19 14:55:00+00 | ap_clients.wlan0 |    45
 2017-12-19 15:00:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:05:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:10:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:15:00+00 | ap_clients.wlan0 |    52
 2017-12-19 15:20:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:25:00+00 | ap_clients.wlan0 |    53
 2017-12-19 15:30:00+00 | ap_clients.wlan0 |    49
 2017-12-19 15:35:00+00 | ap_clients.wlan0 |    39
 2017-12-19 15:40:00+00 | ap_clients.wlan0 |    16

但是我的上一个查询并没有忽略最后一条记录,因为我有相同的记录,不使用子查询“而id不在(按时间戳desc限制1从default_dataset顺序中选择id)”。

如果我尝试查询以查看“ap_clients.wlan0”类型的结果,我有

ws_controller_hist=> select * from default_dataset where organization_id='ce4b69af-bdce-4f1b-ba71-dd03544205d5' and type ='ap_clients.wlan0' order by timestamp desc limit 5;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 e67baf28-6d5b-43a5-85e2-fcf2d04a0b2e | 2018-01-02 10:06:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 c7ce16ce-9cda-423f-b32b-f4d6dce859e6 | 2018-01-02 10:01:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}

我该怎么办?

1 个答案:

答案 0 :(得分:1)

一个简单的解决方案是使用laglead窗口函数,其参数不能为NULL,这样lag将返回NULL第一行和lead将为最后一行返回NULL,因此您可以对两者都为NOT NULL的行进行简单过滤:

SELECT
    t2.timestamp_min,
    t2.type,
    t2.sum_first_medium
FROM (
    SELECT
        t1.*,
        lead(1) OVER(ORDER BY t1.timestamp_min) AS lead,
        lag(1) OVER(ORDER BY t1.timestamp_min) AS lag
    FROM (
        SELECT
            to_timestamp(
              floor(
                (extract('epoch' FROM TIMESTAMP) / 300)
              ) * 300
            ) AS timestamp_min,
            type,
            floor(sum(medium[1])) AS sum_first_medium
        FROM default_dataset
        WHERE
            type = 'ap_clients.wlan0'
            AND timestamp > current_timestamp - INTERVAL '85 minutes'
            AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
        GROUP BY timestamp_min, type
    ) t1
) t2
WHERE
    t2.lag IS NOT NULL -- Only first row will return NULL, skip it
    AND t2.lead IS NOT NULL -- Only last row will return NULL, skip it
ORDER BY t2.timestamp_min

注意我使用lead(1)lag(1)只是因为1是一个非NULL表达式,你可以使用任何非NULL表达式甚至是一个列(因为保证是NOT NULL)。

另一种可能的解决方案是应用两个row_number()调用,一个使用ORDER BY timestamp_min ASC,另一个使用ORDER BY timestamp_min DESC,然后过滤那些<> 1的行。但这需要两种类型的数据集(一个用于ASC,一个用于DESC),而lag/lead解决方案只需要一个(尽管可能更难理解)。