我需要找到按年和部门分类的序列集缺失的数字。例如,我在表格中有以下信息:
╔══════╤══════╤═════╗
║ YEAR │ DEPT │ NUM ║
╠══════╪══════╪═════╣
║ 2016 │ 1 │ 1 ║
╟──────┼──────┼─────╢
║ 2016 │ 1 │ 2 ║
╟──────┼──────┼─────╢
║ 2016 │ 1 │ 4 ║
╟──────┼──────┼─────╢
║ 2016 │ 2 │ 10 ║
╟──────┼──────┼─────╢
║ 2016 │ 2 │ 12 ║
╟──────┼──────┼─────╢
║ 2016 │ 2 │ 13 ║
╟──────┼──────┼─────╢
║ 2015 │ 3 │ 6 ║
╟──────┼──────┼─────╢
║ 2015 │ 3 │ 8 ║
╟──────┼──────┼─────╢
║ 2015 │ 3 │ 9 ║
╟──────┼──────┼─────╢
║ 2015 │ 2 │ 24 ║
╟──────┼──────┼─────╢
║ 2015 │ 2 │ 26 ║
╟──────┼──────┼─────╢
║ 2015 │ 2 │ 27 ║
╚══════╧══════╧═════╝
通常情况下,我会LEFT JOIN
转到TALLY
表格,但我希望保留缺失值所在的YEAR
和DEPT
。如下所示的方法是我通常会使用什么,但我不知道如何循环丢失值对应的年份和部门,特别是因为MIN
和MAX
值可能因{而异{1}}和YEAR
。
DEPT
我的预期输出如下:
DECLARE @MIN INT = (SELECT MIN(NUM) FROM DOCUMENTS)
DECLARE @MAX INT = (SELECT MAX(NUM) FROM DOCUMENTS)
SELECT
T.NUM AS 'MISSING'
FROM
TALLY T
LEFT JOIN DOCUMENTS D
ON T.NUM = DOCUMENTS.NUM
WHERE
D.NUM IS NULL
AND D.NUM BETWEEN @MIN AND @MAX
我想我可能需要创建一个╔══════╤══════╤═════════════╗
║ YEAR │ DEPT │ MISSING_NUM ║
╠══════╪══════╪═════════════╣
║ 2016 │ 1 │ 3 ║
╟──────┼──────┼─────────────╢
║ 2016 │ 2 │ 11 ║
╟──────┼──────┼─────────────╢
║ 2015 │ 3 │ 7 ║
╟──────┼──────┼─────────────╢
║ 2015 │ 2 │ 25 ║
╚══════╧══════╧═════════════╝
表,其中包含TALLY
,YEAR
和DEPT
列,但每个我将拥有数十亿的价值,因为我有多年的时间来自1800-2016和15个不同的部门,其中NUM
范围从1到1亿这些部门。因此,我认为这不是最有效/最实用的方法。
答案 0 :(得分:2)
如果只有一个值可能丢失,您可以这样做:
select t.year, t.dept, t.num + 1
from t
where t.num < (select max(t2.num) from t t2 where t2.year = t.year and t2.dept = t.dept) and
not exists (select 1
from t t2
where t2.year = t.year and t2.dept = t.dept and
t.num + 1 = t2.num
);
在SQL Server 2012+中,可以简化为:
select year, dept, num + 1 as num
from (select t.*, lead(num) over (partition by year, dept order by num) as next_num
from t
) t
where next_num <> num + 1; -- Note: this handles the final num where `next_num` is `NULL`
这种方法实际上可以推广到找不到的范围。假设您使用的是SQL Server 2012+,那么:
select year, dept, num + 1 as start_missing, next_num - 1 as end_missing
from (select t.*, lead(num) over (partition by year, dept order by num) as next_num
from t
) t
where next_num <> num + 1; -- Note: this handles the final num where `next_num` is `NULL`
答案 1 :(得分:0)
一种方法是使用递归cte,生成年份和部门组合的最小和最大数量之间的所有数字。此后,left join
生成的数字,以找到丢失的数字。
with t1 as (select yr,dept,max(num) maxnum, min(num) minnum
from t
group by yr,dept)
,x as (select yr, dept, minnum, maxnum from t1
union all
select yr, dept, minnum+1, maxnum
from x
where minnum < maxnum
)
select x.yr,x.dept,x.minnum as missing_num
from x
left join t on t.yr=x.yr and t.dept=x.dept and t.num = x.minnum
where t.num is null
order by 1,2,3