结合基于公共元素SQL Server的整数对

时间:2018-12-13 21:04:46

标签: sql sql-server tsql

我在SQL Server中有一个包含两列的表,其中包含整数,例如:

 1  2 
 2  7 
 5  7 
 7 10 
10 11 
12 13 
13 14

我需要根据共同的整数对它们进行分组。在这个例子中 1, 22, 7进入第一组,因为它们共享公共整数25, 7进入了第一组,并且他们共享77, 10也进入了第一组。并且10, 11也进入第一组。

但是1213与第一个组没有共同的成员,因此它们创建了自己的组,并在其中添加了13, 14

所以在输出中我们有

1 1
1 2
1 5
1 7
1 10
1 11
2 12
2 13
2 14

我希望原则很明确。组的名称无关紧要。

对表格进行过滤,以使左整数小于右整数,并且每一行都是唯一的。

为了实现此目标,我已经使用递归查询([dbo].[CDI]是源表)在T-SQL中编写了代码:

CREATE TABLE [dbo].[CDI_proxy]
(
    [ID1] [bigint] NOT NULL,
    [ID2] [bigint] NOT NULL,
    primary key ([ID1], [ID2])
) ON [PRIMARY]

CREATE TABLE [dbo].[CDL]
(
    [ID1] [bigint] NOT NULL,
    [ID2] [bigint] NOT NULL,
    [cnt] [int] NOT NULL default(0),
    primary key ([ID1], [ID2])
) ON [PRIMARY]

create nonclustered index IX_1
on [dbo].[CDL] ([cnt]) include ([ID1], [ID2])

CREATE TABLE [dbo].[CDR]
(
    [ID1] [bigint] NOT NULL,
    [ID2] [bigint] NULL
) ON [PRIMARY]

insert into [dbo].[CDI_proxy]
    select
        d1.ID1,
        d2.ID2
    from
        [dbo].[CDI] d1

    union

    select
        d2.ID2,
        d2.ID1
    from
        [dbo].[CDI] d2;


WITH cte([ID1], [ID2], LVL) AS
(
    --Anchor Member
    (SELECT
         d1.ID1,
         d1.ID2,
         0 as LVL
     FROM 
         [dbo].[CDI_proxy] d1
    )

    UNION ALL

    --Recursive Member
    SELECT 
        r.[ID1],
        cte.[ID2],
        LVL + 1 AS LVL
    FROM
        [dbo].[CDI_proxy] r
    INNER JOIN
        cte ON r.ID2 = cte.ID1
)
INSERT INTO [dbo].[CDL]([ID1], [ID2])
    SELECT DISTINCT
        cte1.[ID1],
        cte1.[ID2]
    FROM 
        cte cte1
    OPTION (MAXRECURSION 0)

UPDATE [dbo].[CDL]
SET [cnt] = ag.cnt
FROM 
    (SELECT cdl.ID1, COUNT(cdl.ID2) AS cnt
     FROM [dbo].CDL cdl
     GROUP BY cdl.ID1) ag
WHERE ag.ID1 = [CDL].ID1

INSERT INTO [dbo].[CDR] ([ID1], [ID2])
    SELECT
        [ID1], [ID2]
    FROM
        (SELECT 
             cdl.*,
             ROW_NUMBER() OVER (PARTITION BY cdl.ID2 
                                ORDER BY cdl.cnt DESC, cdl.ID1 DESC) rnk
         FROM 
             [dbo].[CDL] cdl) cdl
    WHERE 
        rnk = 1

我大约在运行此脚本。 500万行,并且要花3个小时才能无休止地运行(我证明了)。如果我这样更改脚本的一部分

 --Recursive Member
    SELECT r.[ID1],
            cte.[ID2],
            LVL + 1 AS LVL
    FROM [dbo].[CDI_proxy] r
        inner join
        cte ON r.ID2=cte.ID1
        where LVL > 5

)

INSERT INTO ...

然后运行3分钟,然后我查看查询结果

select id1, count(*) cnt
from dbo.CDR
group by id1
having count(*) > 5
order by cnt desc

那么最高的组只有8个成员。

我怀疑我的查询在LVL小于5时会进入无限递归。是否可能?如果可以,怎么办?

还是我的代码有误?

1 个答案:

答案 0 :(得分:1)

这是遍历无向图的示例-边缘在两个方向上移动。在SQL Server中,这有点混乱。

但是想法是从每个边缘开始。然后在任一端添加另一条边,比较节点以确保不生成任何周期。这可以递归完成。

然后,您可以获取出现在路径中的最小节点值,并使用该值来定义图形。

以下是一些代码:

with t as (
      select *
      from (values (1, 2), (2, 7), (5, 7), (7, 10), (10, 11), (12, 13), (13, 14)) v(x, y)
     ),
     tt as (
      select v.x, v.y
      from t cross apply
           (values (x, y), (y, x)) v(x, y)
     ),
     cte as (
      select (case when tt.x < tt.y then tt.x else tt.y end) as lowest, v.val, tt.x, tt.y, convert(varchar(max), concat(',', tt.x, ',', tt.y, ',')) as vals
      from tt cross apply
           (values (x), (y)) v(val)
      union all
      select (case when tt.y < cte.lowest then tt.y else cte.lowest end) as lowest, cte.val, cte.x, tt.y, concat(cte.vals, tt.y, ',') as vals
      from cte join
           tt
           on cte.y = tt.x and cte.vals not like concat('%,', tt.y, ',%')
      union all
      select (case when tt.x < cte.lowest then tt.x else cte.lowest end) as lowest, cte.val, tt.x, cte.y, concat(cte.vals, tt.x, ',') as vals
      from cte join
           tt
           on cte.x = tt.y and cte.vals not like concat('%,', tt.x, ',%')
     )
select min(lowest) as grp, val
from cte
group by val;

还有db<>fiddle