我在SQL Server中有一个包含两列的表,其中包含整数,例如:
1 2
2 7
5 7
7 10
10 11
12 13
13 14
我需要根据共同的整数对它们进行分组。在这个例子中
1, 2
和2, 7
进入第一组,因为它们共享公共整数2
。 5, 7
进入了第一组,并且他们共享7
。 7, 10
也进入了第一组。并且10, 11
也进入第一组。
但是12
和13
与第一个组没有共同的成员,因此它们创建了自己的组,并在其中添加了13, 14
。
所以在输出中我们有
1 1
1 2
1 5
1 7
1 10
1 11
2 12
2 13
2 14
我希望原则很明确。组的名称无关紧要。
对表格进行过滤,以使左整数小于右整数,并且每一行都是唯一的。
为了实现此目标,我已经使用递归查询([dbo].[CDI]
是源表)在T-SQL中编写了代码:
CREATE TABLE [dbo].[CDI_proxy]
(
[ID1] [bigint] NOT NULL,
[ID2] [bigint] NOT NULL,
primary key ([ID1], [ID2])
) ON [PRIMARY]
CREATE TABLE [dbo].[CDL]
(
[ID1] [bigint] NOT NULL,
[ID2] [bigint] NOT NULL,
[cnt] [int] NOT NULL default(0),
primary key ([ID1], [ID2])
) ON [PRIMARY]
create nonclustered index IX_1
on [dbo].[CDL] ([cnt]) include ([ID1], [ID2])
CREATE TABLE [dbo].[CDR]
(
[ID1] [bigint] NOT NULL,
[ID2] [bigint] NULL
) ON [PRIMARY]
insert into [dbo].[CDI_proxy]
select
d1.ID1,
d2.ID2
from
[dbo].[CDI] d1
union
select
d2.ID2,
d2.ID1
from
[dbo].[CDI] d2;
WITH cte([ID1], [ID2], LVL) AS
(
--Anchor Member
(SELECT
d1.ID1,
d1.ID2,
0 as LVL
FROM
[dbo].[CDI_proxy] d1
)
UNION ALL
--Recursive Member
SELECT
r.[ID1],
cte.[ID2],
LVL + 1 AS LVL
FROM
[dbo].[CDI_proxy] r
INNER JOIN
cte ON r.ID2 = cte.ID1
)
INSERT INTO [dbo].[CDL]([ID1], [ID2])
SELECT DISTINCT
cte1.[ID1],
cte1.[ID2]
FROM
cte cte1
OPTION (MAXRECURSION 0)
UPDATE [dbo].[CDL]
SET [cnt] = ag.cnt
FROM
(SELECT cdl.ID1, COUNT(cdl.ID2) AS cnt
FROM [dbo].CDL cdl
GROUP BY cdl.ID1) ag
WHERE ag.ID1 = [CDL].ID1
INSERT INTO [dbo].[CDR] ([ID1], [ID2])
SELECT
[ID1], [ID2]
FROM
(SELECT
cdl.*,
ROW_NUMBER() OVER (PARTITION BY cdl.ID2
ORDER BY cdl.cnt DESC, cdl.ID1 DESC) rnk
FROM
[dbo].[CDL] cdl) cdl
WHERE
rnk = 1
我大约在运行此脚本。 500万行,并且要花3个小时才能无休止地运行(我证明了)。如果我这样更改脚本的一部分
--Recursive Member
SELECT r.[ID1],
cte.[ID2],
LVL + 1 AS LVL
FROM [dbo].[CDI_proxy] r
inner join
cte ON r.ID2=cte.ID1
where LVL > 5
)
INSERT INTO ...
然后运行3分钟,然后我查看查询结果
select id1, count(*) cnt
from dbo.CDR
group by id1
having count(*) > 5
order by cnt desc
那么最高的组只有8个成员。
我怀疑我的查询在LVL小于5时会进入无限递归。是否可能?如果可以,怎么办?
还是我的代码有误?
答案 0 :(得分:1)
这是遍历无向图的示例-边缘在两个方向上移动。在SQL Server中,这有点混乱。
但是想法是从每个边缘开始。然后在任一端添加另一条边,比较节点以确保不生成任何周期。这可以递归完成。
然后,您可以获取出现在路径中的最小节点值,并使用该值来定义图形。
以下是一些代码:
with t as (
select *
from (values (1, 2), (2, 7), (5, 7), (7, 10), (10, 11), (12, 13), (13, 14)) v(x, y)
),
tt as (
select v.x, v.y
from t cross apply
(values (x, y), (y, x)) v(x, y)
),
cte as (
select (case when tt.x < tt.y then tt.x else tt.y end) as lowest, v.val, tt.x, tt.y, convert(varchar(max), concat(',', tt.x, ',', tt.y, ',')) as vals
from tt cross apply
(values (x), (y)) v(val)
union all
select (case when tt.y < cte.lowest then tt.y else cte.lowest end) as lowest, cte.val, cte.x, tt.y, concat(cte.vals, tt.y, ',') as vals
from cte join
tt
on cte.y = tt.x and cte.vals not like concat('%,', tt.y, ',%')
union all
select (case when tt.x < cte.lowest then tt.x else cte.lowest end) as lowest, cte.val, tt.x, cte.y, concat(cte.vals, tt.x, ',') as vals
from cte join
tt
on cte.x = tt.y and cte.vals not like concat('%,', tt.x, ',%')
)
select min(lowest) as grp, val
from cte
group by val;
还有db<>fiddle。