递归查询问题 - 检索有向连接的集群

时间:2010-11-02 14:39:43

标签: sql recursion social-networking

尝试不多次选择相同的连接我尝试使用charindex()= 0条件以下方式:

WITH Cluster(calling_party, called_party, link_strength, Path)
AS
(SELECT
    calling_party,
    called_party,
    link_strength,
    CONVERT(varchar(max), calling_party + '.' + called_party) AS Path
FROM
    monthly_connections_test
WHERE
    link_strength > 0.1 AND
    calling_party = 'b'
UNION ALL
SELECT
    mc.calling_party,
    mc.called_party,
    mc.link_strength,
    CONVERT(varchar(max), cl.Path + '.' + mc.calling_party + '.' + mc.called_party) AS Path
FROM
    monthly_connections_test mc
INNER JOIN Cluster cl ON
    (
        mc.called_party = cl.called_party OR
        mc.called_party = cl.calling_party
    ) AND
    (
        CHARINDEX(cl.called_party + '.' + mc.calling_party, Path) = 0 AND
        CHARINDEX(cl.called_party + '.' + mc.called_party, Path) = 0
    )
WHERE
    mc.link_strength > 0.1
)
SELECT
    calling_party,
    called_party,
    link_strength,
    Path
FROM
    Cluster OPTION (maxrecursion 30000)

然而,条件不能实现其目的,因为多次选择相同的行。

此处的实际目的是检索所选用户(在示例用户b中)所属的整个连接群。

EDIT1:

我尝试按以下方式修改查询:

With combined_users AS
 (SELECT calling_party CALLING, called_party CALLED, link_strength FROM dbo.monthly_connections_test WHERE link_strength > 0.1),
 related_users1 AS
 (
 SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLING = 'a1'
 UNION ALL
 SELECT c.CALLING, c.CALLED, c.link_strength,
    convert(varchar(max),r.path  + c.CALLED + '.') path 
        from combined_users c 
        join related_users1 r 
        ON (c.CALLING = r.CALLED) and CHARINDEX(c.CALLING + '.' + c.CALLED + '.', r.path)= 0 

        ),
related_users2 AS
(
SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLED = 'a1'
UNION ALL
 SELECT c.CALLING, c.CALLED, c.link_strength,
    convert(varchar(max),r.path  + c.CALLING + '.') path 
        from combined_users c 
        join related_users2 r 
        ON c.CALLED = r.CALLING and CHARINDEX('.' + c.CALLING + '.' + c.CALLED, r.path)= 0
)
        SELECT CALLING, CALLED, link_strength, path FROM
        (SELECT CALLING, CALLED, link_strength, path FROM related_users1 UNION SELECT CALLING, CALLED, link_strength, path FROM related_users2) r OPTION (MAXRECURSION 30000)

为了测试查询,我在下面创建了集群:

alt text

上面的查询返回了下表:

a1  a2  1.0000000   .a1.a2.
a11 a13 1.0000000   .a12.a1.a13.a11.
a12 a1  1.0000000   .a12.a1.
a13 a12 1.0000000   .a12.a1.a13.
a14 a13 1.0000000   .a12.a1.a13.a14.
a15 a14 1.0000000   .a12.a1.a13.a14.a15.
a2  a10 1.0000000   .a1.a2.a10.
a2  a3  1.0000000   .a1.a2.a3.
a3  a4  1.0000000   .a1.a2.a3.a4.
a3  a6  1.0000000   .a1.a2.a3.a6.
a4  a8  1.0000000   .a1.a2.a3.a4.a8.
a4  a9  1.0000000   .a1.a2.a3.a4.a9.

查询显然返回到所选节点的连接和相反方向的连接。问题是方向的改变:例如,由于方向改变(相对于起始节点),因此未选择连接a7,a4和a11,a10。

有人知道如何修改查询以包含所有连接吗?

谢谢

3 个答案:

答案 0 :(得分:1)

好的,这里有几件事要讨论。

Zerothly,我有PostgreSQL,所以这一切都完成了;我正在尝试只使用标准SQL,所以这也应该适用于SQL Server。

首先,如果您只对链接强度大于0.1的呼叫感兴趣,那么请说:

-- like calls, but only strong enough to be interesting
create view strong_calls (calling_party, called_party, link_strength)
as (
  select calling_party, called_party, link_strength
  from monthly_connections_test
  where link_strength > 0.1
);

从现在开始,我们将根据此表进行讨论。

其次,你说:

  

此处的实际目的是检索所选用户(在示例用户b中)所属的整个连接群。

如果这是真的,你为什么要计算路径?如果您只想知道连接集,可以执行以下操作:

with recursive cluster (calling_party, called_party, link_strength)
as (
  (
    select calling_party, called_party, link_strength
    from strong_calls
    where calling_party = 'b'
  )
  union
  (
    select c.calling_party, c.called_party, c.link_strength
    from cluster this, strong_calls c
    where c.calling_party = this.called_party
    or c.called_party = this.calling_party
  )
)
select *
from cluster;

第三,也许你真的不想找到群集中的连接,你想要找到群集中的哪些人,以及从目标到他们的最短路径是什么。在这种情况下,你可以这样做:

with recursive cluster (party, path)
as (
  select cast('b' as character varying), cast('b' as character varying)
  union
  (
    select (case
      when this.party = c.calling_party then c.called_party
      when this.party = c.called_party then c.calling_party
    end), (this.path || '.' || (case
      when this.party = c.calling_party then c.called_party
      when this.party = c.called_party then c.calling_party
    end))
    from cluster this, strong_calls c
    where (this.party = c.calling_party and position(c.called_party in this.path) = 0)
    or (this.party = c.called_party and position(c.calling_party in this.path) = 0)
  )
)
select party, path
from cluster
where not exists (
  select *
  from cluster c2
  where cluster.party = c2.party
  and (
    char_length(cluster.path) > char_length(c2.path)
    or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path)
  )
)
order by party, path;

正如你所看到的,你们正走在正确的轨道上。

如果你确实想要一个群集中所有呼叫的列表,有路径,那么,呃,我会回复你的!

编辑:请记住,不构造路径的查询将具有与那些相同的性能特征。粗略地说,非路径查询将执行O(n)工作(可能在大约O(log n)迭代步骤中),因为它们访问集群中的每个节点,但路径构建步骤将做更多 - O(恩!)也许? - 因为他们必须通过图表访问每个路径。如果集群和示例中的集群一样大,那么你会好的,但是如果它们更大,你可能会发现运行时是禁止的。

答案 1 :(得分:0)

charindex('b.d','b.c.d.b')= 0因为有'c'。之间

易于阅读:

WITH cluster(calling_party, called_party, link_strength, PATH) 
     AS (SELECT calling_party, 
                called_party, 
                link_strength, 
                CONVERT(VARCHAR(MAX), calling_party + '.' + called_party) AS 
                PATH 
         FROM   monthly_connections_test 
         WHERE  link_strength > 0.1 
                AND calling_party = 'b' 
         UNION ALL 
         SELECT mc.calling_party, 
                mc.called_party, 
                mc.link_strength, 
                CONVERT(VARCHAR(MAX), cl.PATH + '.' + mc.calling_party + '.' + 
                mc.called_party) 
                AS PATH 
         FROM   monthly_connections_test mc 
                INNER JOIN cluster cl 
                  ON ( mc.called_party = cl.called_party 
                        OR mc.called_party = cl.calling_party ) 
                     AND ( Charindex(cl.called_party + '.' + mc.calling_party, 
                           PATH) 
                           = 0 
                           AND Charindex(cl.called_party + '.' + 
                               mc.called_party, 
                               PATH) 
                               = 
                               0 ) 
         WHERE  mc.link_strength > 0.1) 
SELECT calling_party, 
       called_party, 
       link_strength, 
       PATH 
FROM   cluster 
OPTION (MAXRECURSION 30000) 

答案 2 :(得分:0)

要解决您编辑过的问题,如果您想忽略链接的方向性,请尝试:

create view symmetric_users (calling_party, called_party, link_strength)
as (
  select calling_party, called_party, link_strength from monthly_connections_test
  union
  select called_party , calling_party, link_strength from monthly_connections_test
)

然后将您的查询指向该位置。

如果您有相互呼叫的用户,则每个有序用户对将有两行。你应该能够通过选择更强大的东西来过滤掉那些。