https://www.db-fiddle.com/f/2bzoKxbU2gznwwmQpMmjp5/0
(实际数据库是Microsoft SQL Server 2014)
以上是我正在尝试做的事情。
CREATE TABLE IF NOT EXISTS table1 (
id nvarchar(5) NOT NULL,
year int(4) NOT NULL,
PRIMARY KEY (id,year)
);
INSERT INTO table1 (id, year) VALUES
('A', '2013'),
('A', '2014'),
('A', '2017'),
('A', '2018'),
('B', '2016'),
('B', '2017'),
('B', '2018'),
('C', '2016'),
('D', '2014'),
('D', '2016'),
('D', '2018');
这大致就是我正在使用的数据,我想在其中查找在年列中也包含“ 2018”的每个ID的连续/顺序记录数。
到目前为止,我的思考过程是这样的:
select id, count(*)
from table1
group by id;
select main.id,
case when in_2018.id is not null
then count(*)
else 0
end
from table1 as main
left join table1 as in_2018
on in_2018.id = main.id
and
in_2018.year = 2018
group by main.id;
/*
Want a table:
A | 2
B | 3
C | 0
D | 1
Count of records that are in a single-step incremental that include 2018 by id
*/
很明显,这些不返回连续的行,只是满足“ 2018”标准的计数。
我尝试了另一种检查方法:
case when count(*) = max(year) - min(year) +1,
在我的数据示例中,该方法适用于ID B,因为B的所有数据都是顺序的,但无法解决其他ID的损坏模式。
答案 0 :(得分:0)
在SQL Server中,您可以使用row_number()
解决此问题:
select top (1) id, count(*)
from (select t.*, row_number() over (partition by id order by year) as seqnum
from table1 t
) t
group by id, (year - seqnum)
having sum(case when year = 2018 then 1 else 0 end) > 0
order by count(*) desc;
这使用了这样的观察结果:year - seqnum
在连续的年份中是恒定的。
在不支持窗口功能的数据库中,最简单的解决方案可能是执行相同计算的相关子查询:
select id, count(*)
from (select t.*,
(select count(*)
from table1 tt
where tt.id = t.id and tt.year <= t.year
) as seqnum
from table1 t
) t
group by id, (year - seqnum)
having sum(case when year = 2018 then 1 else 0 end) > 0
order by count(*) desc
fetch first 1 year only;
Here是db <>小提琴。
答案 1 :(得分:0)
我看到Gordon击败了我,并且查询短了很多。但是我走了这么远,无论如何我都会张贴它。我认为总体思路大致相同,但是我不依赖任何非标准功能(我认为),我希望我可以通过添加一些注释来弥补额外的代码,以使其更长。 ;-)
每个子查询都可以单独运行,因此您可以逐步了解如何放大结果。
select
id,
max(span) as nr_of_years
from
( -- This inner query gives all the valid ranges, but they have to be deduplicates
-- For instance, it can give B 2017-2018 while there is also B 2016-2018, which has precedence.
-- That's why the outer query uses max, to get the longest range
select
s.id,
s.year,
s.otheryear,
s.span,
s.rows_in_span
from
( -- Find all possible 'spans' of years between two rows with the same id.
-- also find how much rows are in that span. They should match.
select
a.id,
a.year,
b.year as otheryear,
a.year - b.year + 1 as span,
( select count(*) from table1 c
where
c.id = a.id and
c.year >= b.year and
c.year <= a.year) as rows_in_span
from
table1 a
join table1 b on b.ID = a.ID and b.year <= a.year -- like a cross join, but per ID
) s
where
-- if they are not equal, it means one year is missing between the lowest and highest year in the span
s.span = s.rows_in_span and
-- If the difference between the year and 2018 is more than this, this is a range, but it's out of scope
abs(s.year - 2018) < s.span
) f
group by
f.id
在小提琴中,您也可以看到它也适用于Postgres(您可以在数据库之间切换,我笨拙地创建了create语句以允许这样做):