查找丢失和乱序的记录

时间:2019-11-20 00:19:51

标签: sql sql-server tsql

架构:

简体-列出ID号,其版本和状态的表格:

CREATE TABLE archive
    ([id] int, [version] int, [status] varchar(1));

INSERT INTO archive
    ([id], [version], [status])
VALUES
    (1, 1, 'A'),
    (1, 2, 'S'),
    (1, 3, 'T'),
    (1, 4, 'A'),
    (2, 2, 'T'),
    (2, 4, 'T'),
    (3, 1, 'A'),
    (3, 3, 'A');

问题:

某些记录缺少其完整的历史记录(版本)。所有ID均应以版本1开头,并且版本号应连续(与上述架构中的ID 2和3不同)。

所需的输出

所有ID的列表,显示其现有版本以及“跳过”的版本。根据下面的示例,输出应如下所示:

id | ver | check
---+-----+------
  1|   1 |   1
  1|   2 |   2
  1|   3 |   3
  1|   4 |   4
  2| NULL|   1
  2|   2 |   2
  2| NULL|   3
  2|   4 |   4
  3|   1 |   1
  3| NULL|   2
  3|   3 |   3

到目前为止,我的努力:

问题类似于this one,但没有像已经回答的问题那样有固定的“ Table2”。每条记录的版本号未知。

到目前为止,我已经提出了以下建议:

SELECT sub.id, sub.ver, sub.seq
FROM (
      SELECT CASE WHEN a.id IS NULL THEN b.id ELSE a.id END as 'id', b.version as 'ver', a.seq as 'seq'
      FROM (select *,
                   row_number() over (partition by id order by version asc) as seq
              from archive) a
      FULL OUTER JOIN archive b ON a.id=b.id AND a.seq=b.version) sub
ORDER BY sub.id, sub.ver, sub.seq

通过以下输出,我几乎可以到达那里:

enter image description here

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:3)

这可以通过使用recursive cte.

来实现
with cte as (
  select 1 as ctr, id, max(version) version from archive group by id
  union all 
  select ctr + 1, id, version from cte
  where ctr < version
)
select t1.id, t2.version, ctr as [check] from cte t1
left join archive t2 on t2.id = t1.id and t1.ctr = t2.version
order by t1.id, t1.ctr;

请参阅dbfiddle

答案 1 :(得分:1)

这是另一个使用数字表且不受任何CTE递归边界影响的版本。数字表所支持的值范围比递归CTE所能提供的范围大得多。

-- Create a numbers table. This table can be generated each time 
-- or stored in a static table. Numbers tables are wonderful things.
DROP TABLE IF EXISTS #Numbers
SELECT
    ROW_NUMBER() OVER (ORDER BY n1.[object_id]) AS [number]
INTO
    #Numbers
FROM
    [sys].[objects] AS n1
    ,[sys].[objects] AS n2

-- Calculate the "range" of version numbers for each [id]
;WITH [range]
AS
(
    SELECT
        [id]
        ,1 AS [min_version]
        ,MAX([version]) AS [max_version]
    FROM
        [archive] AS a
    GROUP BY
        [id]
), [expected]
AS
(
    SELECT
        DISTINCT
        a.[id]
        ,n.[number]
    FROM
        #Numbers AS n
        INNER JOIN [range] AS a
            ON n.number BETWEEN a.[min_version] AND a.[max_version]
)
SELECT
    e.[id]
    ,a.[version] AS [ver]
    ,e.[number] AS [check]
FROM
    [expected] AS e
    LEFT OUTER JOIN [archive] AS a
        ON e.[id] = a.[id]
        AND e.[number] = a.[version]