我有一些记录:
+---+--------+---------------------------+ | | Data | Time | +---+--------+---------------------------+ | 1 | 1 | 2013-04-22 16:18:07 | | 2 | 1 | 2013-04-22 16:18:17 | | 3 | 2 | 2013-04-22 16:18:27 | | 4 | 2 | 2013-04-22 16:18:37 | | 5 | 1 | 2013-04-22 16:18:47 | | 6 | 1 | 2013-04-22 16:18:57 | | 7 | 1 | 2013-04-22 16:19:07 | | 8 | 3 | 2013-04-22 16:19:17 | | 9 | 3 | 2013-04-22 16:19:27 | | 10| 1 | 2013-04-22 16:19:37 | | 11| 2 | 2013-04-22 16:19:47 | | 12| 2 | 2013-04-22 16:19:57 | | 13| 3 | 2013-04-22 16:20:07 | | 14| 3 | 2013-04-22 16:20:17 | +---+--------+---------------------------+
我如何获得这些记录?:
+---+--------+---------------------------+ | | Data | Time | +---+--------+---------------------------+ | 1 | 1 | 2013-04-22 16:18:07 | | 3 | 2 | 2013-04-22 16:18:27 | | 5 | 1 | 2013-04-22 16:18:47 | | 8 | 3 | 2013-04-22 16:19:17 | | 10| 1 | 2013-04-22 16:19:37 | | 11| 2 | 2013-04-22 16:19:47 | | 13| 3 | 2013-04-22 16:20:07 | +---+--------+---------------------------+
我想为每个子组选择第一个条目,但如果我使用distinct - 我有这个记录数组:
+---+--------+---------------------------+ | | Data | Time | +---+--------+---------------------------+ | 1 | 1 | 2013-04-22 16:18:07 | | 3 | 2 | 2013-04-22 16:18:27 | | 8 | 3 | 2013-04-22 16:19:17 | +---+--------+---------------------------+
答案 0 :(得分:2)
这里的问题是您需要定义您正在查看的组。对不同的组重复“数据”值。
这是一种查找每个组的方法。为每个行分配一个顺序值,按时间排序。然后,为每个数据值分配另一个顺序值,按时间排序。当值连续时,这些值之间的差异是恒定的。
以下内容对您的数据使用了这个想法。识别出组后,此方法使用group by
来获取数据:
select MIN(data) as data, MIN(time) as time
from (select t.*,
(ROW_NUMBER() over (order by time) -
ROW_NUMBER() over (partition by data order by time
) as thegroup
from t
) t
group by thegroup
如果您想要保留更多列,则可以枚举每个组中的行以获取第一个:
select data, time
from (select t.*, ROW_NUMBER() over (partition by thegroup order by time) as seqnum
from (select t.*,
(ROW_NUMBER() over (order by time) -
ROW_NUMBER() over (partition by data order by time
) as thegroup
from t
) t
group by thegroup
) t
where seqnum = 1
您也可以使用Postgres的distinct on
语法执行此操作。
答案 1 :(得分:1)
这是一个更简单有效的版本
SELECT
*
FROM
(
SELECT
id,
data,
time,
lag( id, 1 ) over( partition by data ORDER BY id ) as prev_id
FROM t
) t
WHERE
prev_id is null
OR id - prev_id > 1
ORDER BY
id
由于你需要从每个组中获取first row
,我使用PostgreSQL窗口函数lag()来生成一个名为prev_id
的列,如下所示(下表仅适用于那些数据为records
的{{1}},也为其他1
值创建了类似的表格
data
如果上述2个条件中的任何一个,+---+----------+
| id | prev_id |
+---+----------+
| 1 | NULL | This row is valid as lag is NULL
| 2 | 1 |
| 3 | 2 |
| 5 | 3 | This row is valid as diff is > 1 (between prevoius_id and current_id )
| 6 | 5 |
| 7 | 6 |
| 10 | 7 | This row is valid as diff is > 1 (between prevoius_id and current_id )
或lag is NULL
为id-lag > 1
,我认为此行为true
<{1}}
答案 2 :(得分:0)
按数据和时间使用分组而不是分开
“按数据分组”将按行数据字段对行进行分组,但如果您设置“和时间”,它还会按时间对数据组进行分组