以下是我创建的查询,用于计算两个用户的公共强连接(双向连接)邻居的数量:
DECLARE @monthly_connections_test TABLE (
calling_party VARCHAR(50)
, called_party VARCHAR(50))
INSERT INTO @monthly_connections_test
SELECT 'z1', 'z2'
UNION ALL SELECT 'z1', 'z3'
UNION ALL SELECT 'z1', 'z4'
UNION ALL SELECT 'z1', 'z5'
UNION ALL SELECT 'z1', 'z6'
UNION ALL SELECT 'z2', 'z1'
UNION ALL SELECT 'z2', 'z4'
UNION ALL SELECT 'z2', 'z5'
UNION ALL SELECT 'z2', 'z7'
UNION ALL SELECT 'z3', 'z1'
UNION ALL SELECT 'z4', 'z7'
UNION ALL SELECT 'z5', 'z1'
UNION ALL SELECT 'z5', 'z2'
UNION ALL SELECT 'z7', 'z4'
UNION ALL SELECT 'z7', 'z2'
SELECT t1.user1, t1.user2,
0 AS calling_calling, 0 AS calling_called,
0 AS called_calling, 0 AS called_called,
COUNT(*) AS both_directions
FROM (SELECT relevant_monthly_connections.calling_party AS user1,
relevant_monthly_connections_1.calling_party AS user2,
relevant_monthly_connections.called_party AS calledUser
FROM @monthly_connections_test relevant_monthly_connections
INNER JOIN @monthly_connections_test AS relevant_monthly_connections_1
ON relevant_monthly_connections.called_party = relevant_monthly_connections_1.called_party
AND relevant_monthly_connections.calling_party < relevant_monthly_connections_1.calling_party
) t1
INNER JOIN @monthly_connections_test AS relevant_monthly_connections_2
ON relevant_monthly_connections_2.called_party = t1.user1
AND relevant_monthly_connections_2.calling_party = t1.calledUser
GROUP BY t1.user1, t1.user2
现在我想算一下user1或user2的强连接邻居。因此,例如对于对(z1,z2),强连接邻居的数量是3(z1强烈连接到z2,z3,z5和z2被忽略,因为它是来自该对的节点之一并且z2强连接到z1,z5和z7。再次忽略z1并且count((z3,z5)U(z5,z7))为3)。
有没有人知道如何创建查询来计算与每对中的一个节点之间强烈连接的所有节点的数量(查询必须自动计算每个记录的所有邻居的数量)?
编辑#1:
以下查询返回所有双向连接的表:
WITH bidirectionalConnections AS
(
SELECT calling_party AS user1, called_party AS user2 FROM @monthly_connections_test WHERE calling_party < called_party
INTERSECT
SELECT called_party AS user2, calling_party AS user2 FROM @monthly_connections_test
)
SELECT user1, user2 FROM bidirectionalConnections
现在,对于每对节点,必须在表bidirectionalConnections中检查有多少节点与该对中的第一个或第二个节点强连接。
必须自动生成结果中的对和邻居数。
编辑#2:
以下是@monthly_connections_test表描述的图片:
因此与z1 OR z2强连接的邻居是z3,z5,z7
z1,z3:z2,z5
z1,z4:z2,z3,z5,z7
...
z1,z7:z2,z3,z4,z5
...
结果表应采用以下格式:
user1, user2, total_neighbors_count
z1, z2, 3
z1, z3, 2
z1, z4, 4
...
z1, z7, 4
...
谢谢!
P.S。
我发布了类似的问题How to use JOIN instead of UNION to count the neighbors of “A OR B”?,但它不一样,所以我希望这个问题不要被视为重复。
答案 0 :(得分:2)
我相信下面展示的查询将产生所需的结果。我已经构造了查询以使管道的每个阶段都显式化,这增加了副作用,为查询优化器提供了关于如何最小化中间行集大小的强烈提示。查看查询本身中的注释以用于每个阶段。
;WITH
-- identify the strongly connected parties
-- both directions are included here for later convenience
stronglyConnected AS (
SELECT DISTINCT
l.calling_party AS party1
, l.called_party AS party2
FROM @monthly_connections_test AS l
INNER JOIN @monthly_connections_test AS r
ON r.calling_party = l.called_party
AND r.called_party = l.calling_party
)
-- identify all of the parties that participated in a strong connection
, uniqueParties AS (
SELECT DISTINCT party1 AS party FROM stronglyConnected
)
-- make all unique pairs of such parties
, allPairs AS (
SELECT
u1.party AS party1
, u2.party AS party2
FROM uniqueParties AS u1
CROSS JOIN uniqueParties AS u2
WHERE u1.party < u2.party
)
-- find the neighbours of each pair
, pairNeighbors AS (
SELECT DISTINCT
p.party1
, p.party2
, sc.party2 AS neighbor
FROM allPairs AS p
INNER JOIN stronglyConnected AS sc
ON sc.party1 IN (p.party1, p.party2)
AND sc.party2 NOT IN (p.party1, p.party2)
)
-- count the neighbours of each pair
, neighbourCounts AS (
SELECT
party1 AS user1
, party2 AS user2
, COUNT(*) AS total_neighborCount
FROM pairNeighbors
GROUP BY
party1
, party2
)
-- show the final result
SELECT * FROM neighbourCounts ORDER BY 1, 2
-- handy for testing, debugging and answering other queries:
-- SELECT * FROM stronglyConnected ORDER BY 1, 2
-- SELECT * FROM uniqueParties ORDER BY 1
-- SELECT * FROM allPairs ORDER BY 1, 2
-- SELECT * FROM pairNeighbors ORDER BY 1, 2
答案 1 :(得分:1)
我认为您在问题中提供的示例查询是错误的(基于描述) - 它返回z5
- z7
作为强连接对,当该组合根本不存在时样本数据。我相信这是一个正确的实施:
SELECT calling.*
FROM @monthly_connections_test AS calling
WHERE EXISTS ( SELECT 1
FROM @monthly_connections_test AS called
WHERE calling.calling_party = called.called_party
AND calling.called_party = called.calling_party
)
AND calling.calling_party < calling.called_party
我已将此实现扩展为提供您想要的内容。这不是一个特别漂亮的解决方案,应该在更大的数据集上进行测试,因为它可能无法精确扩展。我使用了SQL 2008变量表示法,因为您的其他问题引用了SQL 2008。
DECLARE @user1 varchar(50) = 'z1'
DECLARE @user2 varchar(50) = 'z2'
;WITH strongCTE
AS
(
SELECT calling.calling_party AS c1,
calling.called_party AS c2
FROM @monthly_connections_test AS calling
WHERE EXISTS ( SELECT 1
FROM @monthly_connections_test AS called
WHERE calling.calling_party = called.called_party
AND calling.called_party = called.calling_party
)
AND calling.calling_party < calling.called_party
)
SELECT COUNT(1) AS ConnectedNeighboursToUser1orUser2
FROM
(
SELECT c2
FROM strongCTE
WHERE c1 = @user1
AND c2 NOT IN (@user1,@user2)
GROUP BY c1,c2
UNION
SELECT c2
FROM strongCTE
WHERE c1 = 'z2'
AND c2 NOT IN (@user1,@user2)
GROUP BY c1,c2
) AS x
答案 2 :(得分:1)
基于Edit2,以下查询给出了已列出的结果:
declare @party1 varchar(50)
declare @party2 varchar(50)
--Since we're only interested in strong connections, we can treat both parties as calling_party in the following queries
select @party1 = 'z1', @party2 = 'z7'
select
distinct mt.called_party
from
@monthly_connections_test mt
inner join
@monthly_connections_test mt2
on
mt.called_party = mt2.calling_party and
mt.calling_party = mt2.called_party
where
mt.calling_party in (@party1,@party2) and
not mt.called_party in (@party1,@party2)
要获得计数,您需要在select子句中使用COUNT(distinct mt.called_party)
以下列出了每个连接对的所有组计数。我认为如果我们需要避免强连接对的重复,这会变得更加棘手:
select grp.called_party,grp.calling_party,COUNT(distinct mt.called_party )
from
(select
CASE WHEN calling_party < called_party THEN calling_party ELSE called_party END as calling_party,CASE WHEN calling_party < called_party THEN called_party ELSE calling_party END as called_party FROM @monthly_connections_test) grp,
@monthly_connections_test mt
inner join
@monthly_connections_test mt2
on
mt.called_party = mt2.calling_party and
mt.calling_party = mt2.called_party
where
mt.calling_party in (grp.called_party,grp.calling_party) and
not mt.called_party in (grp.called_party,grp.calling_party)
group by grp.called_party,grp.calling_party