我有一张如下所示的表格
record similar_record
rec_1 rec_2
rec_3 rec_4
rec_2 rec_3
rec_5 rec_7
以上数据显示哪两个记录相似。 例如:在上面的数据集中,rec_1类似于rec_2,rec_2类似于rec_3,rec_3类似于rec_4,因此它们必须转到一个组。 rec_5和rec_7相似,因此它们形成一个组。我们必须生成组标识符,它们不必是整数。
我正在尝试在MySQL上编写SQL查询以生成以下输出。
group record
1 rec_1
1 rec_2
1 rec_3
1 rec_4
2 rec_5
2 rec_7
记录不必在一个单独的行中,如果结果是由GROUP_CONCAT获得的,那么每个组都有一些分隔符。
有人可以帮我查询吗?
答案 0 :(得分:1)
这是一种递归蛮力方法。适用于MySQL 8.也适用于MariaDB 10.2:
create table graph (
node1 varchar(50),
node2 varchar(50)
);
insert into graph (node1, node2) values
('rec_1', 'rec_2'),
('rec_3', 'rec_4'),
('rec_2', 'rec_3'),
('rec_5', 'rec_7');
with recursive numerated as (
select g.*, ROW_NUMBER() OVER (PARTITION BY null ORDER BY node1) as grp
from graph g
), normalized as (
select grp, node1 as node from numerated
union distinct
select grp, node2 as node from numerated
), rcte as (
select n.grp as grp1, n.*
from normalized n
union all
select rcte.grp1 as grp1, n2.grp, n2.node
from rcte
join normalized n1 on n1.node = rcte.node and n1.grp > rcte.grp
join normalized n2 on n2.node <> n1.node and n2.grp = n1.grp
), cte4 as (
select node, min(grp1) as grp1
from rcte
group by node
)
select DENSE_RANK() OVER (PARTITION BY null ORDER BY grp1) as grp, node
from cte4
order by grp, node;
结果:
grp | node
----|------
1 | rec_1
1 | rec_2
1 | rec_3
1 | rec_4
2 | rec_5
2 | rec_7