识别/比较组内的行集

时间:2016-12-07 13:30:20

标签: sql sql-server tsql

我有一个似乎很容易解决的问题,但现在我发现它很麻烦。

简化中 - 我需要找到一种方法来识别由另一列定义的组中的唯一行集。在基本示例中,源表只包含两列:

routeID nodeID nodeName 
   1       1      a
   1       2      b
   2       1      a
   2       2      b
   3       1      a
   3       2      b
   4       1      a
   4       2      c
   5       1      a
   5       2      c
   6       1      a
   6       2      b
   6       3      d
   7       1      a
   7       2      b
   7       3      d 

因此,routeID列指的是定义路由的节点集。

我需要做的是以某种方式对路由进行分组,以便一个routeID只有一个唯一的节点序列。

在我的实际情况中,我尝试使用窗口函数添加有助于识别节点序列的列,但我仍然不知道如何获取这些唯一序列和组路由。

作为最后的效果,我想获得唯一的路线 - 例如路线1,2和3汇总到一条路线。

你知道如何帮助我吗?

编辑:

我想与示例中的一个表连接的另一个表可能看起来像这样:

journeyID nodeID nodeName routeID
    1        1       a       1
    1        2       b       1
    2        1       a       1
    2        2       b       1
    3        1       a       4
    3        2       c       4 
    ...........................
    ...........................

2 个答案:

答案 0 :(得分:0)

你可以尝试这个想法:

DECLARE @DataSource TABLE
(
    [routeID] TINYINT 
   ,[nodeID] TINYINT
   ,[nodeName] CHAR(1)
);

INSERT INTO @DataSource ([routeID], [nodeID], [nodeName])
VALUES   ('1', '1', 'a')
        ,('1', '2', 'b')
        ,('2', '1', 'a')
        ,('2', '2', 'b')
        ,('3', '1', 'a')
        ,('3', '2', 'b')
        ,('4', '1', 'a')
        ,('4', '2', 'c')
        ,('5', '1', 'a')
        ,('5', '2', 'c')
        ,('6', '1', 'a')
        ,('6', '2', 'b')
        ,('6', '3', 'd')
        ,('7', '1', 'a')
        ,('7', '2', 'b')
        ,('7', '3', 'd');


SELECT DS.[routeID]
      ,nodes.[value]
      ,ROW_NUMBER() OVER (PARTITION BY nodes.[value] ORDER BY [routeID]) AS [rowID]
FROM 
(   
    -- getting unique route ids
    SELECT DISTINCT [routeID]
    FROM @DataSource DS
) DS ([routeID])
CROSS APPLY
(
    -- for each route id creating CSV list with its node ids
    SELECT STUFF
    (
        (
            SELECT ',' + [nodeName]
            FROM @DataSource DSI
            WHERE DSI.[routeID] = DS.[routeID]
            ORDER BY [nodeID]
            FOR XML PATH(''), TYPE
        ).value('.', 'VARCHAR(MAX)')
        ,1
        ,1
        ,''
    )
) nodes ([value]);

代码将为您提供此输出:

enter image description here

因此,您只需按rowID = 1进行过滤即可。当然,您可以根据需要更改代码,以满足您的商务标准(例如,显示没有第一个具有相同节点但最后一个节点的路径ID)。

此外,ROW_NUMBER函数不能直接在WHERE子句中使用,因此您需要在过滤之前包装代码:

WITH DataSource AS
(
    SELECT DS.[routeID]
          ,nodes.[value]
          ,ROW_NUMBER() OVER (PARTITION BY nodes.[value] ORDER BY [routeID]) AS [rowID]
    FROM 
    (   
        -- getting unique route ids
        SELECT DISTINCT [routeID]
        FROM @DataSource DS
    ) DS ([routeID])
    CROSS APPLY
    (
        -- for each route id creating CSV list with its node ids
        SELECT STUFF
        (
            (
                SELECT ',' + [nodeName]
                FROM @DataSource DSI
                WHERE DSI.[routeID] = DS.[routeID]
                ORDER BY [nodeID]
                FOR XML PATH(''), TYPE
            ).value('.', 'VARCHAR(MAX)')
            ,1
            ,1
            ,''
        )
    ) nodes ([value])
)
SELECT DS2.*
FROM DataSource DS1
INNER JOIN @DataSource DS2
    ON DS1.[routeID] = DS2.[routeID]
WHERE DS1.[rowID] = 1;

enter image description here

答案 1 :(得分:0)

好的,让我们使用一些递归为每个routeID创建一个完整的节点列表

首先让我们填充源表和旅程故事

 -- your source  
declare @r as table (routeID int, nodeID int, nodeName char(1))

-- your other table  
declare @j as table (journeyID int, nodeID int, nodeName char(1), routeID int) 

 -- temp results table  
declare @routes as table (routeID int primary key, nodeNames varchar(1000))

;with
s as (
    select *
    from (
        values
        (1,       1,      'a'),
        (1,       2,      'b'),
        (2,       1,      'a'),
        (2,       2,      'b'),
        (3,       1,      'a'),
        (3,       2,      'b'),
        (4,       1,      'a'),
        (4,       2,      'c'),
        (5,       1,      'a'),
        (5,       2,      'c'),
        (6,       1,      'a'),
        (6,       2,      'b'),
        (6,       3,      'd'),
        (7,       1,      'a'),
        (7,       2,      'b'),
        (7,       3,      'd') 
    ) s  (routeID, nodeID, nodeName)
)
insert into @r
select *
from s

;with
s as (
    select *
    from (
        values 
        (1,        1,       'a',       1),
        (1,        2,       'b',       1),
        (2,        1,       'a',       1),
        (2,        2,       'b',       1),
        (3,        1,       'a',       4),
        (3,        2,       'c',       4)
    ) s  (journeyID, routeID, nodeID, nodeName)
)
insert into @j
select *
from s

现在让我们的exctract路线:

;with
d as (
    select *, row_number() over (partition by r.routeID order by r.nodeID desc) n2
    from @r r
),
r as (
    select d.*, cast(nodeName as varchar(1000)) Names, cast(0 as bigint) i2
    from d
    where nodeId=1
    union all
    select d.*, cast(r.names + ',' + d.nodeName as varchar(1000)), r.n2
    from d
    join r on r.routeID = d.routeID and r.nodeId=d.nodeId-1 
)
insert into @routes
select routeID, Names
from r
where n2=1

表@routes将是这样的:

routeID nodeNames
1       'a,b'
2       'a,b'
3       'a,b'
4       'a,c'
5       'a,c'
6       'a,b,d'
7       'a,b,d'

现在是最终输出:

-- the unique routes 
select MIN(r.routeID) routeID, nodeNames
from @routes r
group by nodeNames

-- the unique journyes
select MIN(journeyID) journeyID, r.nodeNames
from @j j
inner join @routes r on j.routeID = r.routeID
group by nodeNames

输出:

routeID nodeNames
1       'a,b'
4       'a,c'
6       'a,b,d'

journeyID   nodeNames
1           'a,b'
3           'a,c'