在下表中,每一行代表一对元素之间的连接。
例如:A连接到B,B连接到C,C连接到D和G,G连接到H.
因此所有连接的元素共享同一个组,例如名为“1”。
e1 e2 group
A B 1
B C 1
C D 1
C G 1
E F 2
I E 2
H G 1
J K 3
K L 3
如何在R(或SQL)中编写高效算法来计算未知'组' 只有e1,e2之间连接的表?
答案 0 :(得分:1)
您的数据是一个图表,由其边缘列表定义,
你想要它的连接组件。
这就是clusters
包中的igraph
函数计算的内容。
# Sample data
d <- structure(c("A", "B", "C", "C", "E", "I", "H", "J", "K", "B",
"C", "D", "G", "F", "E", "G", "K", "L"), .Dim = c(9L, 2L), .Dimnames = list(
NULL, c("e1", "e2")))
library(igraph)
g <- graph.edgelist( as.matrix(d) )
clusters(d)
# $membership
# [1] 1 1 1 1 1 2 2 2 1 3 3 3
答案 1 :(得分:0)
递归CTE(这适用于Postgres,Oracle需要进行细微更改) 注意:如果没有反措施,将无法避免某些循环,从而导致无限递归。
CREATE TABLE pairs
( e1 varchar NOT NULL
, e2 varchar NOT NULL
, PRIMARY KEY (e1,e2)
);
INSERT INTO pairs(e1,e2) VALUES
('A' , 'B' )
, ('B','C' )
, ('C','D' )
, ('C','G' )
, ('E','F' )
, ('I','E' )
, ('H','G' )
, ('J','K' )
, ('K','L' )
;
WITH RECURSIVE tree AS (
WITH dpairs AS (
SELECT e1 AS one, e2 AS two FROM pairs WHERE e1 < e2
UNION ALL
SELECT e2 AS one, e1 AS two FROM pairs WHERE e1 > e2
)
SELECT dp.one AS opa
, dp.one AS one
, dp.two AS two
FROM dpairs dp
WHERE NOT EXISTS ( SELECT *
FROM dpairs nx
WHERE nx.two = dp.one
AND nx.one < dp.one
)
UNION ALL
SELECT tr.opa AS opa
, dp.one AS one
, dp.two AS two
FROM tree tr
JOIN dpairs dp ON dp.one = tr.two AND dp.two <> tr.opa AND dp.two <> tr.one
)
SELECT opa,one,two
, dense_rank() OVER (ORDER BY opa) AS rnk
FROM tree
ORDER BY opa, one,two
;
结果:
opa | one | two | rnk
-----+-----+-----+-----
A | A | B | 1
A | B | C | 1
A | C | D | 1
A | C | G | 1
A | G | H | 1
E | E | F | 2
E | E | I | 2
J | J | K | 3
J | K | L | 3
(9 rows)