在表

时间:2016-12-08 09:21:58

标签: sql oracle algorithm merge

我有一个包含重复的表。 识别重复的方法是   - 密钥应该在同一组(1,2,3或4)   - p应该是相同的   - P是一个id,表示这些键是相同的 密钥只能在同一组中多次匹配。

假设我们在下面有这个示例:

key,p,group
1,1,1
33,1,1
5,1,1
5,2,1
4,2,1
4,15,1
8,4,1
10,5,1
15,6,1
21,15,1
78,7,1
79,8,2
80,8,2
81,9,3
82,9,3
85,10,4
90,11,1
91,11,1
73,12,1

输出应为:

key,p,group
1,999,1
5,999,1
4,999,1
21,999,1
33,999,1
8,4,1
10,5,1
15,6,1
78,7,1
79,111,2
80,111,2
81,666,3
82,666,3
85,10,4
90,222,1
91,222,1
73,12,1

1,5,4,21和33具有相同的p值(999,这个数字只是将副本分组在一起的新ID), 因为它们在同一组(组= 1),1,5和33匹配(p = 1),5和4匹配(p = 2),4和21匹配(p = 15)

对于90,91,即使他们在组1中,他们只匹配在一起,因为他们没有与该组中的另一个键链接(交叉)。

79和80属于同一组(组= 2)

8保持p = 4,因为他与组中的其他键不匹配=。

依此类推...... 我正在寻找一种在SQL(Oracle)或算法中实现它的方法......

实际上, 它不起作用。 如果你有这个输入:

key,p,group
55,9,6
56,10,6
56,11,6
58,9,6
58,11,6

输出

key,p,group
55,9,6
56,9,6
58,9,6
56,10,6
58,10,6

或者我需要:

key,p,group
55,9,6
56,9,6
58,9,6
56,9,6
58,9,6

感谢您的帮助

2 个答案:

答案 0 :(得分:1)

如果我正确理解了问题:将行视为(非定向)图的节点,如果它们具有相同的p和组值或相同的键和组值,则边连接节点。然后找到该图的连通分量,并更改p值,使连通分量中的所有节点具有相同的p值。

如果是这样,可以使用分层查询(加上之前和之后所需的所有处理;主要部分是分层查询)来完成。在下面的解决方案中,我将连接组件中的所有p值更改为组中p值的MIN(而不是随机值);如果"随机值"我们也希望能够做到这一点,但这是一个与更简单的解决方案不同的问题(并且可能首先不需要)。

GROUP不是一个好的列名,因为它是Oracle中的保留字。我把它改成了GRP。

with
-- begin test data (this is not part of the solution)
     inputs ( key, p, grp ) as (
       select  1,  1, 1 from dual union all
       select 33,  1, 1 from dual union all
       select  5,  1, 1 from dual union all
       select  5,  2, 1 from dual union all
       select  4,  2, 1 from dual union all
       select  4, 15, 1 from dual union all
       select  8,  4, 1 from dual union all
       select 10,  5, 1 from dual union all
       select 15,  6, 1 from dual union all
       select 21, 15, 1 from dual union all
       select 78,  7, 1 from dual union all
       select 79,  8, 2 from dual union all
       select 80,  8, 2 from dual union all
       select 81,  9, 3 from dual union all
       select 82,  9, 3 from dual union all
       select 85, 10, 4 from dual union all
       select 90, 11, 1 from dual union all
       select 91, 11, 1 from dual union all
       select 73, 12, 1 from dual union all
       select 55,  9, 6 from dual union all
       select 56, 10, 6 from dual union all
       select 56, 11, 6 from dual union all
       select 58,  9, 6 from dual union all
       select 58, 11, 6 from dual
     ),
-- end of test data; solution (SQL query) continues below this line
     prep ( grp, parent, child ) as (
       select distinct a.grp, a.p, b.p
       from   inputs a inner join inputs b
                       on a.grp = b.grp and a.key = b.key
     ),
     h ( grp, rt, child ) as (
       select grp, connect_by_root parent, child
       from   prep
       connect by nocycle grp = prior grp and parent = prior child
     )
select distinct i.key, g.new_p as p, i.grp
from   inputs i join (
                       select grp, rt, min(child) as new_p
                       from   h
                       group by grp, rt
                     ) g
                 on g.grp = i.grp and g.rt = i.p
order by grp, p, key   --   optional
;

<强>输出

       KEY          P        GRP
---------- ---------- ----------
         1          1          1
         4          1          1
         5          1          1
        21          1          1
        33          1          1
         8          4          1
        10          5          1
        15          6          1
        78          7          1
        90         11          1
        91         11          1
        73         12          1
        79          8          2
        80          8          2
        81          9          3
        82          9          3
        85         10          4
        55          9          6
        56          9          6
        58          9          6

20 rows selected.

答案 1 :(得分:0)

select KEY, 
       P, 
       GRP, 
       'group of '||count(*) over (partition by p,grp)||' with p value '||p
from key_table

输出:

1   1   1   group of 3 with p value 1
33  1   1   group of 3 with p value 1
5   1   1   group of 3 with p value 1
5   2   1   group of 2 with p value 2
4   2   1   group of 2 with p value 2
8   4   1   group of 1 with p value 4
10  5   1   group of 1 with p value 5
15  6   1   group of 1 with p value 6
78  7   1   group of 1 with p value 7
79  8   2   group of 2 with p value 8
80  8   2   group of 2 with p value 8
81  9   3   group of 2 with p value 9
82  9   3   group of 2 with p value 9
85  10  4   group of 1 with p value 10
91  11  1   group of 2 with p value 11
90  11  1   group of 2 with p value 11
73  12  1   group of 1 with p value 12
4   15  1   group of 2 with p value 15
21  15  1   group of 2 with p value 15

我对随机值不太热衷,但是根据需要改变输出功能。