Question

我有一个包含重复的表。识别重复的方法是 - 密钥应该在同一组（1,2,3或4） - p应该是相同的 - P是一个id，表示这些键是相同的密钥只能在同一组中多次匹配。

假设我们在下面有这个示例：

key,p,group
1,1,1
33,1,1
5,1,1
5,2,1
4,2,1
4,15,1
8,4,1
10,5,1
15,6,1
21,15,1
78,7,1
79,8,2
80,8,2
81,9,3
82,9,3
85,10,4
90,11,1
91,11,1
73,12,1

输出应为：

key,p,group
1,999,1
5,999,1
4,999,1
21,999,1
33,999,1
8,4,1
10,5,1
15,6,1
78,7,1
79,111,2
80,111,2
81,666,3
82,666,3
85,10,4
90,222,1
91,222,1
73,12,1

1,5,4,21和33具有相同的p值（999，这个数字只是将副本分组在一起的新ID），因为它们在同一组（组= 1），1,5和33匹配（p = 1），5和4匹配（p = 2），4和21匹配（p = 15）

对于90,91，即使他们在组1中，他们只匹配在一起，因为他们没有与该组中的另一个键链接（交叉）。

79和80属于同一组（组= 2）

8保持p = 4，因为他与组中的其他键不匹配=。

依此类推...... 我正在寻找一种在SQL（Oracle）或算法中实现它的方法......

实际上，它不起作用。如果你有这个输入：

key,p,group
55,9,6
56,10,6
56,11,6
58,9,6
58,11,6

输出

key,p,group
55,9,6
56,9,6
58,9,6
56,10,6
58,10,6

或者我需要：

key,p,group
55,9,6
56,9,6
58,9,6
56,9,6
58,9,6

感谢您的帮助

Answer 1

如果我正确理解了问题：将行视为（非定向）图的节点，如果它们具有相同的p和组值或相同的键和组值，则边连接节点。然后找到该图的连通分量，并更改p值，使连通分量中的所有节点具有相同的p值。

如果是这样，可以使用分层查询（加上之前和之后所需的所有处理;主要部分是分层查询）来完成。在下面的解决方案中，我将连接组件中的所有p值更改为组中p值的MIN（而不是随机值）;如果＆＃34;随机值＆＃34;我们也希望能够做到这一点，但这是一个与更简单的解决方案不同的问题（并且可能首先不需要）。

GROUP不是一个好的列名，因为它是Oracle中的保留字。我把它改成了GRP。

with
-- begin test data (this is not part of the solution)
     inputs ( key, p, grp ) as (
       select  1,  1, 1 from dual union all
       select 33,  1, 1 from dual union all
       select  5,  1, 1 from dual union all
       select  5,  2, 1 from dual union all
       select  4,  2, 1 from dual union all
       select  4, 15, 1 from dual union all
       select  8,  4, 1 from dual union all
       select 10,  5, 1 from dual union all
       select 15,  6, 1 from dual union all
       select 21, 15, 1 from dual union all
       select 78,  7, 1 from dual union all
       select 79,  8, 2 from dual union all
       select 80,  8, 2 from dual union all
       select 81,  9, 3 from dual union all
       select 82,  9, 3 from dual union all
       select 85, 10, 4 from dual union all
       select 90, 11, 1 from dual union all
       select 91, 11, 1 from dual union all
       select 73, 12, 1 from dual union all
       select 55,  9, 6 from dual union all
       select 56, 10, 6 from dual union all
       select 56, 11, 6 from dual union all
       select 58,  9, 6 from dual union all
       select 58, 11, 6 from dual
     ),
-- end of test data; solution (SQL query) continues below this line
     prep ( grp, parent, child ) as (
       select distinct a.grp, a.p, b.p
       from   inputs a inner join inputs b
                       on a.grp = b.grp and a.key = b.key
     ),
     h ( grp, rt, child ) as (
       select grp, connect_by_root parent, child
       from   prep
       connect by nocycle grp = prior grp and parent = prior child
     )
select distinct i.key, g.new_p as p, i.grp
from   inputs i join (
                       select grp, rt, min(child) as new_p
                       from   h
                       group by grp, rt
                     ) g
                 on g.grp = i.grp and g.rt = i.p
order by grp, p, key   --   optional
;

<强>输出：

       KEY          P        GRP
---------- ---------- ----------
         1          1          1
         4          1          1
         5          1          1
        21          1          1
        33          1          1
         8          4          1
        10          5          1
        15          6          1
        78          7          1
        90         11          1
        91         11          1
        73         12          1
        79          8          2
        80          8          2
        81          9          3
        82          9          3
        85         10          4
        55          9          6
        56          9          6
        58          9          6

20 rows selected.

Answer 2

select KEY, 
       P, 
       GRP, 
       'group of '||count(*) over (partition by p,grp)||' with p value '||p
from key_table

输出：

1   1   1   group of 3 with p value 1
33  1   1   group of 3 with p value 1
5   1   1   group of 3 with p value 1
5   2   1   group of 2 with p value 2
4   2   1   group of 2 with p value 2
8   4   1   group of 1 with p value 4
10  5   1   group of 1 with p value 5
15  6   1   group of 1 with p value 6
78  7   1   group of 1 with p value 7
79  8   2   group of 2 with p value 8
80  8   2   group of 2 with p value 8
81  9   3   group of 2 with p value 9
82  9   3   group of 2 with p value 9
85  10  4   group of 1 with p value 10
91  11  1   group of 2 with p value 11
90  11  1   group of 2 with p value 11
73  12  1   group of 1 with p value 12
4   15  1   group of 2 with p value 15
21  15  1   group of 2 with p value 15

我对随机值不太热衷，但是根据需要改变输出功能。

在表

2 个答案: