如何计算A或B的双向连接邻居的数量?

时间:2010-12-03 11:56:30

标签: sql sql-server graph social-networking

以下是我创建的查询,用于计算两个用户的公共强连接(双向连接)邻居的数量:

DECLARE @monthly_connections_test TABLE (
  calling_party VARCHAR(50)
  , called_party VARCHAR(50))

INSERT INTO @monthly_connections_test
          SELECT 'z1', 'z2'
UNION ALL SELECT 'z1', 'z3'
UNION ALL SELECT 'z1', 'z4'
UNION ALL SELECT 'z1', 'z5'
UNION ALL SELECT 'z1', 'z6'
UNION ALL SELECT 'z2', 'z1'
UNION ALL SELECT 'z2', 'z4'
UNION ALL SELECT 'z2', 'z5'
UNION ALL SELECT 'z2', 'z7'
UNION ALL SELECT 'z3', 'z1'
UNION ALL SELECT 'z4', 'z7'
UNION ALL SELECT 'z5', 'z1'
UNION ALL SELECT 'z5', 'z2'
UNION ALL SELECT 'z7', 'z4'
UNION ALL SELECT 'z7', 'z2'

SELECT  t1.user1, t1.user2,
        0 AS calling_calling, 0 AS calling_called, 
        0 AS called_calling, 0 AS called_called, 
        COUNT(*) AS both_directions
  FROM (SELECT relevant_monthly_connections.calling_party AS user1, 
               relevant_monthly_connections_1.calling_party AS user2,
               relevant_monthly_connections.called_party AS calledUser
          FROM @monthly_connections_test relevant_monthly_connections 
            INNER JOIN @monthly_connections_test AS relevant_monthly_connections_1 
               ON    relevant_monthly_connections.called_party  = relevant_monthly_connections_1.called_party 
                 AND relevant_monthly_connections.calling_party < relevant_monthly_connections_1.calling_party
       ) t1 
     INNER JOIN @monthly_connections_test AS relevant_monthly_connections_2
       ON     relevant_monthly_connections_2.called_party  = t1.user1
          AND relevant_monthly_connections_2.calling_party = t1.calledUser
  GROUP BY t1.user1, t1.user2

现在我想算一下user1或user2的强连接邻居。因此,例如对于对(z1,z2),强连接邻居的数量是3(z1强烈连接到z2,z3,z5和z2被忽略,因为它是来自该对的节点之一并且z2强连接到z1,z5和z7。再次忽略z1并且count((z3,z5)U(z5,z7))为3)。

有没有人知道如何创建查询来计算与每对中的一个节点之间强烈连接的所有节点的数量(查询必须自动计算每个记录的所有邻居的数量)?

编辑#1:

以下查询返回所有双向连接的表:

WITH bidirectionalConnections AS
(
SELECT calling_party AS user1, called_party AS user2 FROM @monthly_connections_test WHERE calling_party < called_party
INTERSECT
SELECT called_party AS user2, calling_party AS user2 FROM @monthly_connections_test
)
SELECT user1, user2 FROM bidirectionalConnections

现在,对于每对节点,必须在表bidirectionalConnections中检查有多少节点与该对中的第一个或第二个节点强连接。

必须自动生成结果中的对和邻居数。

编辑#2:

以下是@monthly_connections_test表描述的图片: alt text

因此与z1 OR z2强连接的邻居是z3,z5,z7

z1,z3:z2,z5

z1,z4:z2,z3,z5,z7

...

z1,z7:z2,z3,z4,z5

...

结果表应采用以下格式:

user1, user2, total_neighbors_count
z1, z2, 3
z1, z3, 2
z1, z4, 4
...
z1, z7, 4
...

谢谢!

P.S。

我发布了类似的问题How to use JOIN instead of UNION to count the neighbors of “A OR B”?,但它不一样,所以我希望这个问题不要被视为重复。

3 个答案:

答案 0 :(得分:2)

我相信下面展示的查询将产生所需的结果。我已经构造了查询以使管道的每个阶段都显式化,这增加了副作用,为查询优化器提供了关于如何最小化中间行集大小的强烈提示。查看查询本身中的注释以用于每个阶段。

;WITH
  -- identify the strongly connected parties
  -- both directions are included here for later convenience
  stronglyConnected AS (
    SELECT DISTINCT
      l.calling_party AS party1
    , l.called_party AS party2
    FROM @monthly_connections_test AS l
    INNER JOIN @monthly_connections_test AS r
      ON r.calling_party = l.called_party
      AND r.called_party = l.calling_party
  )
  -- identify all of the parties that participated in a strong connection
, uniqueParties AS (
    SELECT DISTINCT party1 AS party FROM stronglyConnected
  )
  -- make all unique pairs of such parties
, allPairs AS (
    SELECT
      u1.party AS party1
    , u2.party AS party2
    FROM uniqueParties AS u1
    CROSS JOIN uniqueParties AS u2
    WHERE u1.party < u2.party
  )
  -- find the neighbours of each pair
, pairNeighbors AS (
    SELECT DISTINCT
      p.party1
    , p.party2
    , sc.party2 AS neighbor
    FROM allPairs AS p
    INNER JOIN stronglyConnected AS sc
      ON sc.party1 IN (p.party1, p.party2)
      AND sc.party2 NOT IN (p.party1, p.party2)
  )
  -- count the neighbours of each pair
, neighbourCounts AS (
    SELECT
      party1 AS user1
    , party2 AS user2
    , COUNT(*) AS total_neighborCount
    FROM pairNeighbors
    GROUP BY
      party1
    , party2
  )
-- show the final result
SELECT * FROM neighbourCounts ORDER BY 1, 2
-- handy for testing, debugging and answering other queries:
-- SELECT * FROM stronglyConnected ORDER BY 1, 2
-- SELECT * FROM uniqueParties ORDER BY 1
-- SELECT * FROM allPairs ORDER BY 1, 2
-- SELECT * FROM pairNeighbors ORDER BY 1, 2

答案 1 :(得分:1)

我认为您在问题中提供的示例查询是错误的(基于描述) - 它返回z5 - z7作为强连接对,当该组合根本不存在时样本数据。我相信这是一个正确的实施:

SELECT calling.*
FROM    @monthly_connections_test AS calling
WHERE   EXISTS  (   SELECT 1
                    FROM @monthly_connections_test AS called
                    WHERE   calling.calling_party   = called.called_party
                    AND     calling.called_party    = called.calling_party
        )
AND     calling.calling_party   < calling.called_party  

我已将此实现扩展为提供您想要的内容。这不是一个特别漂亮的解决方案,应该在更大的数据集上进行测试,因为它可能无法精确扩展。我使用了SQL 2008变量表示法,因为您的其他问题引用了SQL 2008。

DECLARE @user1 varchar(50) = 'z1'
DECLARE @user2 varchar(50) = 'z2'


;WITH strongCTE
AS
(
    SELECT  calling.calling_party AS c1,
            calling.called_party AS c2
    FROM    @monthly_connections_test AS calling
    WHERE   EXISTS  (   SELECT 1
                        FROM @monthly_connections_test AS called
                        WHERE   calling.calling_party   = called.called_party
                        AND     calling.called_party    = called.calling_party
            )
    AND     calling.calling_party   < calling.called_party  
)
SELECT COUNT(1) AS ConnectedNeighboursToUser1orUser2
FROM
(
    SELECT  c2
    FROM    strongCTE
    WHERE   c1 = @user1
    AND     c2 NOT IN (@user1,@user2)
    GROUP BY c1,c2

    UNION

    SELECT  c2
    FROM    strongCTE
    WHERE   c1 = 'z2'
    AND     c2 NOT IN (@user1,@user2)
    GROUP BY c1,c2
) AS x

答案 2 :(得分:1)

基于Edit2,以下查询给出了已列出的结果:

declare @party1 varchar(50)
declare @party2 varchar(50)

--Since we're only interested in strong connections, we can treat both parties as calling_party in the following queries

select @party1 = 'z1', @party2 = 'z7'

select
    distinct mt.called_party 
from
    @monthly_connections_test mt
        inner join
    @monthly_connections_test mt2
        on
            mt.called_party = mt2.calling_party and
            mt.calling_party = mt2.called_party
where
    mt.calling_party in (@party1,@party2) and
    not mt.called_party in (@party1,@party2)

要获得计数,您需要在select子句中使用COUNT(distinct mt.called_party)


以下列出了每个连接对的所有组计数。我认为如果我们需要避免强连接对的重复,这会变得更加棘手:

select grp.called_party,grp.calling_party,COUNT(distinct mt.called_party )
from
    (select
          CASE WHEN calling_party < called_party THEN calling_party ELSE called_party END as calling_party,CASE WHEN calling_party < called_party THEN called_party ELSE calling_party END as called_party FROM @monthly_connections_test) grp,
    @monthly_connections_test mt
        inner join
    @monthly_connections_test mt2
        on
            mt.called_party = mt2.calling_party and
            mt.calling_party = mt2.called_party
where
    mt.calling_party in (grp.called_party,grp.calling_party) and
    not mt.called_party in (grp.called_party,grp.calling_party)
group by grp.called_party,grp.calling_party