如何在SQL中的弧/边集中识别组/簇?

时间:2017-07-20 11:00:33

标签: sql teradata

我有像'haves'这样的弧/边:

Node1   Node2
A       B
B       C
D       E

这里A连接到B和B连接到C.D连接到E.换句话说,这些'想要'中显示了2个组/簇:

Node1   Node2   Cluster
A       B       1
B       C       1
D       E       2

我可以使用SQL来识别这些组/群集吗?我想这涉及自联接,但我看不出如何编写这个SQL。任何反馈都将非常感激。谢谢!

2 个答案:

答案 0 :(得分:2)

假设根据提供的示例数据,数据中没有类似A->B->C->A的循环,以下是将从nodes表返回所需输出的查询。

WITH RECURSIVE NodeCluster (node1,node2,Cluster1) AS
      (SELECT node1,
              node2,
              Rank() Over (
                           ORDER BY node1)
       FROM nodes AS n1
       WHERE NOT EXISTS
           (SELECT *
            FROM nodes AS n2
            WHERE n1.node1 = n2.node2)
       UNION ALL SELECT N1.node1,
                        N1.node2,
                        NodeCluster.Cluster1
       FROM nodes n1,
            NodeCluster
       WHERE NodeCluster.node2=n1.node1 )
    SELECT *
    FROM NodeCluster
    ORDER BY Cluster1,
             node1,
             node2;

在种子查询中,选择所有起始节点并在rankedasc,以按升序为数据分配群集号。

在提供的数据上,以下是输出。

Node1 | Node2  | Cluster1
-------------------------
A       B        1
B       C        1
D       E        2

为了重新保证,样本数据中添加了更多数据,如下所示。

Node1 | Node2
-------------
A        B   
B        C   
D        E   
E        F   
F        G   
H        I   
I        J   
J        K   
L        M  

查询产生了以下输出。

Node1 | Node2  | Cluster1
-------------------------
A       B         1
B       C         1
D       E         2
E       F         2
F       G         2
H       I         3
I       J         3
J       K         3
L       M         4

Teradata SQL Assistantbteq模式下,使用teradataANSI成功测试了解决方案查询。

希望这会有所帮助。

答案 1 :(得分:1)

试试这个:

    DECLARE @Edges TABLE
    (
      ID INT ,
      Node1 CHAR(1) ,
      Node2 CHAR(1)
    );
INSERT  INTO @Edges
        ( Node1, Node2 )
VALUES  ( 'A', 'B' ),
        ( 'B', 'C' ),
        ( 'D', 'E' );
WITH    CTE_Nodes
          AS ( SELECT   Node1 AS Node
               FROM     @Edges
               UNION
               SELECT   Node2 AS Node
               FROM     @Edges
             ),
        CTE_Pairs
          AS ( SELECT   Node1 ,
                        Node2
               FROM     @Edges
               WHERE    Node1 <> Node2
               UNION
               SELECT   Node2 AS Node1 ,
                        Node1 AS Node2
               FROM     @Edges
               WHERE    Node1 <> Node2
             ),
        CTE_Recursive
          AS ( SELECT   CAST(CTE_Nodes.Node AS VARCHAR(8000)) AS AnchorNode ,
                        Node1 ,
                        Node2 ,
                        CAST(',' + Node1 + ',' + Node2 + ',' AS VARCHAR(8000)) AS NodePath ,
                        1 AS Lvl
               FROM     CTE_Pairs
                        INNER JOIN CTE_Nodes ON CTE_Nodes.Node = CTE_Pairs.Node1
               UNION ALL
               SELECT   CTE_Recursive.AnchorNode ,
                        CTE_Pairs.Node1 ,
                        CTE_Pairs.Node2 ,
                        CAST(CTE_Recursive.NodePath + CTE_Pairs.Node2 + ',' AS VARCHAR(8000)) AS NodePath ,
                        CTE_Recursive.Lvl + 1 AS Lvl
               FROM     CTE_Pairs
                        INNER JOIN CTE_Recursive ON CTE_Recursive.Node2 = CTE_Pairs.Node1
               WHERE    CTE_Recursive.NodePath NOT LIKE CAST('%,'
                        + CTE_Pairs.Node2 + ',%' AS VARCHAR(8000))
             ),
        CTE_RecursionResult
          AS ( SELECT   AnchorNode ,
                        Node1 ,
                        Node2
               FROM     CTE_Recursive
             ),
        CTE_CleanResult
          AS ( SELECT   AnchorNode ,
                        Node1 AS Node
               FROM     CTE_RecursionResult
               UNION
               SELECT   AnchorNode ,
                        Node2 AS Node
               FROM     CTE_RecursionResult
             )
    SELECT  Edges.Node1 ,
            Edges.Node2 ,
            DENSE_RANK() OVER ( ORDER BY CASE WHEN CA_Data.XML_Value IS NULL
                                              THEN Edges.Node1
                                              ELSE CA_Data.XML_Value
                                         END ) AS Cluster
    FROM    @Edges Edges
            CROSS APPLY ( SELECT    CTE_CleanResult.Node + ','
                          FROM      CTE_CleanResult
                          WHERE     CTE_CleanResult.AnchorNode = Edges.Node1
                          ORDER BY  CTE_CleanResult.Node
                        FOR
                          XML PATH('') ,
                              TYPE
                        ) AS CA_XML ( XML_Value )
            CROSS APPLY ( SELECT    CA_XML.XML_Value.value('.',
                                                           'NVARCHAR(MAX)')
                        ) AS CA_Data ( XML_Value );

SQL Fiddle DEMO