我有一个包含重复的表。 识别重复的方法是 - 密钥应该在同一组(1,2,3或4) - p应该是相同的 - P是一个id,表示这些键是相同的 密钥只能在同一组中多次匹配。
假设我们在下面有这个示例:
key,p,group
1,1,1
33,1,1
5,1,1
5,2,1
4,2,1
4,15,1
8,4,1
10,5,1
15,6,1
21,15,1
78,7,1
79,8,2
80,8,2
81,9,3
82,9,3
85,10,4
90,11,1
91,11,1
73,12,1
输出应为:
key,p,group
1,999,1
5,999,1
4,999,1
21,999,1
33,999,1
8,4,1
10,5,1
15,6,1
78,7,1
79,111,2
80,111,2
81,666,3
82,666,3
85,10,4
90,222,1
91,222,1
73,12,1
1,5,4,21和33具有相同的p值(999,这个数字只是将副本分组在一起的新ID), 因为它们在同一组(组= 1),1,5和33匹配(p = 1),5和4匹配(p = 2),4和21匹配(p = 15)
对于90,91,即使他们在组1中,他们只匹配在一起,因为他们没有与该组中的另一个键链接(交叉)。
79和80属于同一组(组= 2)
8保持p = 4,因为他与组中的其他键不匹配=。
依此类推...... 我正在寻找一种在SQL(Oracle)或算法中实现它的方法......
实际上, 它不起作用。 如果你有这个输入:
key,p,group
55,9,6
56,10,6
56,11,6
58,9,6
58,11,6
输出
key,p,group
55,9,6
56,9,6
58,9,6
56,10,6
58,10,6
或者我需要:
key,p,group
55,9,6
56,9,6
58,9,6
56,9,6
58,9,6
感谢您的帮助
答案 0 :(得分:1)
如果我正确理解了问题:将行视为(非定向)图的节点,如果它们具有相同的p和组值或相同的键和组值,则边连接节点。然后找到该图的连通分量,并更改p值,使连通分量中的所有节点具有相同的p值。
如果是这样,可以使用分层查询(加上之前和之后所需的所有处理;主要部分是分层查询)来完成。在下面的解决方案中,我将连接组件中的所有p值更改为组中p值的MIN(而不是随机值);如果"随机值"我们也希望能够做到这一点,但这是一个与更简单的解决方案不同的问题(并且可能首先不需要)。
GROUP不是一个好的列名,因为它是Oracle中的保留字。我把它改成了GRP。
with
-- begin test data (this is not part of the solution)
inputs ( key, p, grp ) as (
select 1, 1, 1 from dual union all
select 33, 1, 1 from dual union all
select 5, 1, 1 from dual union all
select 5, 2, 1 from dual union all
select 4, 2, 1 from dual union all
select 4, 15, 1 from dual union all
select 8, 4, 1 from dual union all
select 10, 5, 1 from dual union all
select 15, 6, 1 from dual union all
select 21, 15, 1 from dual union all
select 78, 7, 1 from dual union all
select 79, 8, 2 from dual union all
select 80, 8, 2 from dual union all
select 81, 9, 3 from dual union all
select 82, 9, 3 from dual union all
select 85, 10, 4 from dual union all
select 90, 11, 1 from dual union all
select 91, 11, 1 from dual union all
select 73, 12, 1 from dual union all
select 55, 9, 6 from dual union all
select 56, 10, 6 from dual union all
select 56, 11, 6 from dual union all
select 58, 9, 6 from dual union all
select 58, 11, 6 from dual
),
-- end of test data; solution (SQL query) continues below this line
prep ( grp, parent, child ) as (
select distinct a.grp, a.p, b.p
from inputs a inner join inputs b
on a.grp = b.grp and a.key = b.key
),
h ( grp, rt, child ) as (
select grp, connect_by_root parent, child
from prep
connect by nocycle grp = prior grp and parent = prior child
)
select distinct i.key, g.new_p as p, i.grp
from inputs i join (
select grp, rt, min(child) as new_p
from h
group by grp, rt
) g
on g.grp = i.grp and g.rt = i.p
order by grp, p, key -- optional
;
<强>输出强>:
KEY P GRP
---------- ---------- ----------
1 1 1
4 1 1
5 1 1
21 1 1
33 1 1
8 4 1
10 5 1
15 6 1
78 7 1
90 11 1
91 11 1
73 12 1
79 8 2
80 8 2
81 9 3
82 9 3
85 10 4
55 9 6
56 9 6
58 9 6
20 rows selected.
答案 1 :(得分:0)
select KEY,
P,
GRP,
'group of '||count(*) over (partition by p,grp)||' with p value '||p
from key_table
输出:
1 1 1 group of 3 with p value 1
33 1 1 group of 3 with p value 1
5 1 1 group of 3 with p value 1
5 2 1 group of 2 with p value 2
4 2 1 group of 2 with p value 2
8 4 1 group of 1 with p value 4
10 5 1 group of 1 with p value 5
15 6 1 group of 1 with p value 6
78 7 1 group of 1 with p value 7
79 8 2 group of 2 with p value 8
80 8 2 group of 2 with p value 8
81 9 3 group of 2 with p value 9
82 9 3 group of 2 with p value 9
85 10 4 group of 1 with p value 10
91 11 1 group of 2 with p value 11
90 11 1 group of 2 with p value 11
73 12 1 group of 1 with p value 12
4 15 1 group of 2 with p value 15
21 15 1 group of 2 with p value 15
我对随机值不太热衷,但是根据需要改变输出功能。