我有一个像这样的~20000行表(seq = sequence):
id seq_num seq_count seq_id a b c d
----------------------------------------------------
1 1 3 A400 1 0 0 0
2 2 3 A400 0 1 0 0
3 3 3 A400 0 0 1 0
4 1 2 V2303 1 1 1 1
5 2 2 V2303 1 1 1 1
6 1 3 G2 1 0 0 0
7 2 3 G2 0 1 0 0
8 3 3 G2 0 0 1 0
9 1 3 U900 1 0 0 0
10 2 3 U900 2 2 1 1
11 3 3 U900 5 3 8 5
我想找到表中有重复项的a-b-c-d序列的seq_id,可能只是dbms_ouput.put_line或其他任何东西。所以你可以看到,seq_id G2是A400的副本,因为它们的所有行都匹配,但即使一行与A400和G2匹配,U900也没有重复。
有没有一种很好的方法可以在大型数据集上检查这样的重复项?我无法创建临时保存数据的新表。到目前为止,我一直在尝试使用游标,但没有运气。
谢谢,如果您需要有关我的问题的更多信息,请与我们联系。
答案 0 :(得分:0)
Oracle安装程序:
CREATE TABLE table_name ( id, seq_num, seq_count, seq_id, a, b, c, d ) AS
SELECT 1, 1, 3, 'A400', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 2, 2, 3, 'A400', 0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 3, 3, 3, 'A400', 0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 4, 1, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 5, 2, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 6, 1, 3, 'G2', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 7, 2, 3, 'G2', 0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 8, 3, 3, 'G2', 0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 9, 1, 3, 'U900', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 10, 2, 3, 'U900', 2, 2, 1, 1 FROM DUAL UNION ALL
SELECT 11, 3, 3, 'U900', 5, 3, 8, 5 FROM DUAL;
<强>查询强>:
SELECT s.seq_id,
t.seq_id AS matched_seq_id
FROM table_name s
INNER JOIN
table_name t
ON ( s.seq_num = t.seq_num
AND s.seq_count = t.seq_count
AND s.seq_id < t.seq_id
AND s.a = t.a
AND s.b = t.b
AND s.c = t.c
AND s.d = t.d )
GROUP BY
t.seq_id,
s.seq_id
HAVING COUNT( DISTINCT t.seq_num ) = MAX( t.seq_count );
<强>结果:
SEQ_ID MATCHED_SEQ_ID
------ --------------
A400 G2