PostgreSQL非标准选择(子集)

时间:2013-04-22 16:27:32

标签: sql postgresql

我有一些记录:

+---+--------+---------------------------+
|   | Data   |            Time           |
+---+--------+---------------------------+
| 1 | 1      | 2013-04-22 16:18:07       |
| 2 | 1      | 2013-04-22 16:18:17       |
| 3 | 2      | 2013-04-22 16:18:27       |
| 4 | 2      | 2013-04-22 16:18:37       |
| 5 | 1      | 2013-04-22 16:18:47       |
| 6 | 1      | 2013-04-22 16:18:57       |
| 7 | 1      | 2013-04-22 16:19:07       |
| 8 | 3      | 2013-04-22 16:19:17       |
| 9 | 3      | 2013-04-22 16:19:27       |
| 10| 1      | 2013-04-22 16:19:37       |
| 11| 2      | 2013-04-22 16:19:47       |
| 12| 2      | 2013-04-22 16:19:57       |
| 13| 3      | 2013-04-22 16:20:07       |
| 14| 3      | 2013-04-22 16:20:17       |
+---+--------+---------------------------+

我如何获得这些记录?:

+---+--------+---------------------------+
|   | Data   |            Time           |
+---+--------+---------------------------+
| 1 | 1      | 2013-04-22 16:18:07       |
| 3 | 2      | 2013-04-22 16:18:27       |
| 5 | 1      | 2013-04-22 16:18:47       |
| 8 | 3      | 2013-04-22 16:19:17       |
| 10| 1      | 2013-04-22 16:19:37       |
| 11| 2      | 2013-04-22 16:19:47       |
| 13| 3      | 2013-04-22 16:20:07       |
+---+--------+---------------------------+

我想为每个子组选择第一个条目,但如果我使用distinct - 我有这个记录数组:

+---+--------+---------------------------+
|   | Data   |            Time           |
+---+--------+---------------------------+
| 1 | 1      | 2013-04-22 16:18:07       |
| 3 | 2      | 2013-04-22 16:18:27       |
| 8 | 3      | 2013-04-22 16:19:17       |
+---+--------+---------------------------+

3 个答案:

答案 0 :(得分:2)

这里的问题是您需要定义您正在查看的组。对不同的组重复“数据”值。

这是一种查找每个组的方法。为每个行分配一个顺序值,按时间排序。然后,为每个数据值分配另一个顺序值,按时间排序。当值连续时,这些值之间的差异是恒定的。

以下内容对您的数据使用了这个想法。识别出组后,此方法使用group by来获取数据:

select MIN(data) as data, MIN(time) as time
from (select t.*,
             (ROW_NUMBER() over (order by time) -
              ROW_NUMBER() over (partition by data order by time
             ) as thegroup
      from t
     ) t
group by thegroup

如果您想要保留更多列,则可以枚举每个组中的行以获取第一个:

select data, time
from (select t.*, ROW_NUMBER() over (partition by thegroup order by time) as seqnum
      from (select t.*,
                   (ROW_NUMBER() over (order by time) -
                    ROW_NUMBER() over (partition by data order by time
                   ) as thegroup
            from t
           ) t
      group by thegroup
     ) t
where seqnum = 1

您也可以使用Postgres的distinct on语法执行此操作。

答案 1 :(得分:1)

这是一个更简单有效的版本

SELECT 
  *
FROM 
  (
    SELECT 
      id, 
      data, 
      time, 
      lag( id, 1 ) over( partition by data ORDER BY id ) as prev_id
    FROM t 
  ) t
WHERE 
  prev_id is null 
  OR id - prev_id > 1
ORDER BY
  id

由于你需要从每个组中获取first row,我使用PostgreSQL窗口函数lag()来生成一个名为prev_id的列,如下所示(下表仅适用于那些数据为records的{​​{1}},也为其他1值创建了类似的表格

data

如果上述2个条件中的任何一个,+---+----------+ | id | prev_id | +---+----------+ | 1 | NULL | This row is valid as lag is NULL | 2 | 1 | | 3 | 2 | | 5 | 3 | This row is valid as diff is > 1 (between prevoius_id and current_id ) | 6 | 5 | | 7 | 6 | | 10 | 7 | This row is valid as diff is > 1 (between prevoius_id and current_id ) lag is NULLid-lag > 1,我认为此行为true <{1}}

SQLFIDDLE

答案 2 :(得分:0)

按数据和时间使用分组而不是分开

“按数据分组”将按行数据字段对行进行分组,但如果您设置“和时间”,它还会按时间对数据组进行分组