如何根据元素是否具有连续值对元素集合进行分组?

时间:2015-03-08 12:33:16

标签: sql postgresql gaps-and-islands

所以给定一个如下表所示的表格,我想抓住id至少连续三年的行。

+---------+--------+
|    id   |  year  |
+------------------+
|    2    |   2003 |
|    2    |   2004 |
|    1    |   2005 |
|    2    |   2005 |
|    1    |   2007 |
|    1    |   2008 |
+---------+--------+

这里的结果当然是:

+---------+         
|   id    |         
+---------+         
|    2    |         
+---------+      

关于如何构建查询来完成此任务的任何输入都会很棒。

3 个答案:

答案 0 :(得分:1)

您可以使用JOIN方法(自我加入):

SELECT t1.id
FROM tbl t1 
JOIN tbl t2 ON t2.year = t1.year + 1
           AND t1.id = t2.id
JOIN tbl t3 ON t3.year = t1.year + 2
           AND t1.id = t3.id

SQLFiddle

答案 1 :(得分:1)

当你在id-field上至少有一个索引时,这个可以运行并且可以很快:

WITH t1 AS (
    SELECT  *
    FROM    (VALUES
            (2,2003),
            (2,2004),
            (1,2005),
            (2,2005),
            (1,2007),
            (1,2008)
            ) v(id, year) 
)
SELECT  DISTINCT t1.id
FROM    t1 -- your tablename
    JOIN t1 AS t2 ON t1.id = t2.id AND t1.year + 1 = t2.year
    JOIN t1 AS t3 ON t1.id = t3.id AND t1.year + 2 = t3.year;

答案 2 :(得分:1)

组合(id, year)UNIQUE

通常使用PRIMARY KEYUNIQUE约束或唯一索引保证。

这是任何最小连续行数的通用解决方案:

SELECT DISTINCT id
FROM  (
   SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
   FROM   tbl
   ) sub
GROUP  BY id, grp
HAVING count(*) > 2;  -- minimum: 3

这应该比重复自加入更快,因为只需要基表上的单次扫描。使用EXPLAIN ANALYZE测试性能。

相关答案详细解释:

组合(id, year)不是UNIQUE

您可以在第一步使唯一。

SELECT DISTINCT id
FROM  (
   SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
   FROM   tbl
   GROUP  BY id, year
   ) sub
GROUP  BY id, grp
HAVING count(*) > 2;  -- minimum: 3

SQL Fiddle.

或者您可以使用窗口函数dense_rank()代替row_number()然后使用count(DISTINCT year),但我不会看到此方法的好处。

了解SELECT查询中的事件序列是关键: