所以给定一个如下表所示的表格,我想抓住id
至少连续三年的行。
+---------+--------+
| id | year |
+------------------+
| 2 | 2003 |
| 2 | 2004 |
| 1 | 2005 |
| 2 | 2005 |
| 1 | 2007 |
| 1 | 2008 |
+---------+--------+
这里的结果当然是:
+---------+
| id |
+---------+
| 2 |
+---------+
关于如何构建查询来完成此任务的任何输入都会很棒。
答案 0 :(得分:1)
您可以使用JOIN
方法(自我加入):
SELECT t1.id
FROM tbl t1
JOIN tbl t2 ON t2.year = t1.year + 1
AND t1.id = t2.id
JOIN tbl t3 ON t3.year = t1.year + 2
AND t1.id = t3.id
答案 1 :(得分:1)
当你在id-field上至少有一个索引时,这个可以运行并且可以很快:
WITH t1 AS (
SELECT *
FROM (VALUES
(2,2003),
(2,2004),
(1,2005),
(2,2005),
(1,2007),
(1,2008)
) v(id, year)
)
SELECT DISTINCT t1.id
FROM t1 -- your tablename
JOIN t1 AS t2 ON t1.id = t2.id AND t1.year + 1 = t2.year
JOIN t1 AS t3 ON t1.id = t3.id AND t1.year + 2 = t3.year;
答案 2 :(得分:1)
(id, year)
为UNIQUE
通常使用PRIMARY KEY
或UNIQUE
约束或唯一索引保证。
这是任何最小连续行数的通用解决方案:
SELECT DISTINCT id
FROM (
SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM tbl
) sub
GROUP BY id, grp
HAVING count(*) > 2; -- minimum: 3
这应该比重复自加入更快,因为只需要基表上的单次扫描。使用EXPLAIN ANALYZE
测试性能。
相关答案详细解释:
(id, year)
不是UNIQUE
您可以在第一步使唯一。
SELECT DISTINCT id
FROM (
SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM tbl
GROUP BY id, year
) sub
GROUP BY id, grp
HAVING count(*) > 2; -- minimum: 3
或者您可以使用窗口函数dense_rank()
代替row_number()
然后使用count(DISTINCT year)
,但我不会看到此方法的好处。
了解SELECT
查询中的事件序列是关键: