我有一张桌子:
表1
unique_id user_id user_seq col_name value_val position
1 100 1 test1 100 1
1 100 1 test2 123 1
1 100 1 test1 a 2
1 100 1 test2 text 2
1 100 1 test3 1Rw 2
1 100 1 test4 1Tes 2
2 101 1 test1 1 1
2 101 1 test2 1 1
2 101 1 test3 1 1
2 101 1 test4 1 1
2 101 1 test5 1 1
3 100 1 test1 100 1
3 100 1 test2 123 1
3 100 1 test1 a 2
3 100 1 test2 text 2
3 100 1 test3 1Rw 2
3 100 1 test4 1Tes 2
4 101 1 test1 1 1
4 101 1 test2 1 1
4 101 1 test3 1 1
4 101 1 test4 1 1
我需要根据以下内容查找重复项:
user_id
,user_seq
,col_name
,value_val
和position
对于不同的unique_id应该完全相同。
在上面的示例中,unique_id
-1和3完全相同,因此应将它们作为输出返回。
对于unique_id
= 2和4,对于unique_id
= 4不存在test5的区别,因此不会被捕获。
输出为:
unique_id
1
3
此外,我的数据集非常庞大,大约有5000万条记录,因此需要优化的解决方案。有帮助吗?
编辑
我的表结构:
Name Null? Type
----------- ----- --------------
UNIQUE_ID NUMBER
USER_SEQ VARCHAR2(100)
COL_NAME VARCHAR2(263)
VALUE_VAL VARCHAR2(4000)
POSITION NUMBER
USER_ID NUMBER
没有可用的索引。
答案 0 :(得分:1)
这是一种方法:
with sample_data as (select 1 unique_id, 100 user_id, 1 user_seq, 'test1' col_name, '100' value_val, 1 position from dual union all
select 1 unique_id, 100 user_id, 1 user_seq, 'test2' col_name, '123' value_val, 1 position from dual union all
select 1 unique_id, 100 user_id, 1 user_seq, 'test1' col_name, 'a' value_val, 2 position from dual union all
select 1 unique_id, 100 user_id, 1 user_seq, 'test2' col_name, 'text' value_val, 2 position from dual union all
select 1 unique_id, 100 user_id, 1 user_seq, 'test3' col_name, '1Rw' value_val, 2 position from dual union all
select 1 unique_id, 100 user_id, 1 user_seq, 'test4' col_name, '1Tes' value_val, 2 position from dual union all
select 2 unique_id, 101 user_id, 1 user_seq, 'test1' col_name, '1' value_val, 1 position from dual union all
select 2 unique_id, 101 user_id, 1 user_seq, 'test2' col_name, '1' value_val, 1 position from dual union all
select 2 unique_id, 101 user_id, 1 user_seq, 'test3' col_name, '1' value_val, 1 position from dual union all
select 2 unique_id, 101 user_id, 1 user_seq, 'test4' col_name, '1' value_val, 1 position from dual union all
select 2 unique_id, 101 user_id, 1 user_seq, 'test5' col_name, '1' value_val, 1 position from dual union all
select 3 unique_id, 100 user_id, 1 user_seq, 'test1' col_name, '100' value_val, 1 position from dual union all
select 3 unique_id, 100 user_id, 1 user_seq, 'test2' col_name, '123' value_val, 1 position from dual union all
select 3 unique_id, 100 user_id, 1 user_seq, 'test1' col_name, 'a' value_val, 2 position from dual union all
select 3 unique_id, 100 user_id, 1 user_seq, 'test2' col_name, 'text' value_val, 2 position from dual union all
select 3 unique_id, 100 user_id, 1 user_seq, 'test3' col_name, '1Rw' value_val, 2 position from dual union all
select 3 unique_id, 100 user_id, 1 user_seq, 'test4' col_name, '1Tes' value_val, 2 position from dual union all
select 4 unique_id, 101 user_id, 1 user_seq, 'test1' col_name, '1' value_val, 1 position from dual union all
select 4 unique_id, 101 user_id, 1 user_seq, 'test2' col_name, '1' value_val, 1 position from dual union all
select 4 unique_id, 101 user_id, 1 user_seq, 'test3' col_name, '1' value_val, 1 position from dual union all
select 4 unique_id, 101 user_id, 1 user_seq, 'test4' col_name, '1' value_val, 1 position from dual union all
select 6 unique_id, 101 user_id, 1 user_seq, 'test1' col_name, '1' value_val, 1 position from dual union all
select 6 unique_id, 101 user_id, 1 user_seq, 'test2' col_name, '1' value_val, 1 position from dual union all
select 6 unique_id, 101 user_id, 1 user_seq, 'test3' col_name, '1' value_val, 1 position from dual union all
select 6 unique_id, 101 user_id, 1 user_seq, 'test4' col_name, '1' value_val, 1 position from dual union all
select 7 unique_id, 101 user_id, 1 user_seq, 'test1' col_name, '1' value_val, 1 position from dual union all
select 7 unique_id, 101 user_id, 1 user_seq, 'test2' col_name, '1' value_val, 1 position from dual union all
select 7 unique_id, 101 user_id, 1 user_seq, 'test3' col_name, '1' value_val, 1 position from dual union all
select 7 unique_id, 101 user_id, 1 user_seq, 'test4' col_name, '1' value_val, 1 position from dual union all
select 5 unique_id, 100 user_id, 1 user_seq, 'test1' col_name, '100' value_val, 1 position from dual union all
select 5 unique_id, 100 user_id, 1 user_seq, 'test2' col_name, '123' value_val, 1 position from dual union all
select 5 unique_id, 100 user_id, 1 user_seq, 'test1' col_name, 'a' value_val, 2 position from dual union all
select 5 unique_id, 100 user_id, 1 user_seq, 'test2' col_name, 'text' value_val, 2 position from dual union all
select 5 unique_id, 100 user_id, 1 user_seq, 'test3' col_name, '1Rw' value_val, 2 position from dual union all
select 5 unique_id, 100 user_id, 1 user_seq, 'test4' col_name, '1Tes' value_val, 2 position from dual),
cnts as (select unique_id,
user_id,
user_seq,
col_name,
value_val,
position,
count(*) over (partition by unique_id) cnt
from sample_data),
res as (select distinct sd1.unique_id id1,
sd2.unique_id id2,
sd1.cnt,
count(*) over (partition by sd1.unique_id, sd2.unique_id) total_id1_rows_cnt
from cnts sd1
inner join cnts sd2 on sd1.unique_id < sd2.unique_id
and sd1.user_id = sd2.user_id
and sd1.user_seq = sd2.user_seq
and sd1.col_name = sd2.col_name
and sd1.value_val = sd2.value_val
and sd1.position = sd2.position
and sd1.cnt = sd2.cnt)
select id1||','||listagg(id2, ',') within group (order by id2) grouped_unique_ids
from res
where id1 not in (select id2
from res)
and cnt = total_id1_rows_cnt
group by id1
order by grouped_unique_ids;
然后here's the db<>fiddle证明它有效
答案 1 :(得分:0)
如果性能不是问题,那么自我联接又如何呢?
select a.unique_id as unique_id
from table1 a join table1 b
on a.user_id = b.user_id
and a.user_seq = b.user_seq
and a.col_name = b.col_name
and a.value_val = b.value_val
and a.position = b.position
and a.unique_id <> b.unique_id
答案 2 :(得分:0)
假设您可以将值连接成字符串,也许最简单的方法是:
select *
from (select unique_id, count(*) over (partition by vals) as cnt
from (select unique_id,
listagg(user_id || ':' || user_seq || ':' || col_name || ':' || value_val || ':' || position, ',') within group (order by user_id, user_seq, col_name, value_val, position) as vals
from sample_data sd
group by unique_id
) sd
) sd
where cnt > 1;
Here是db <>小提琴。
让我强调一下:由于Oracle中内部字符串长度的限制,这不是通用解决方案。但这对您的数据有效,并且可能是解决问题的便捷解决方案。