Oracle PL / SQL:如何在大表中查找重复序列?

时间:2016-04-12 22:22:45

标签: sql oracle plsql duplicates

我有一个像这样的~20000行表(seq = sequence):

id    seq_num   seq_count   seq_id    a    b    c    d
----------------------------------------------------
1     1         3           A400      1    0    0    0
2     2         3           A400      0    1    0    0
3     3         3           A400      0    0    1    0
4     1         2           V2303     1    1    1    1
5     2         2           V2303     1    1    1    1
6     1         3           G2        1    0    0    0
7     2         3           G2        0    1    0    0
8     3         3           G2        0    0    1    0
9     1         3           U900      1    0    0    0
10    2         3           U900      2    2    1    1
11    3         3           U900      5    3    8    5

我想找到表中有重复项的a-b-c-d序列的seq_id,可能只是dbms_ouput.put_line或其他任何东西。所以你可以看到,seq_id G2是A400的副本,因为它们的所有行都匹配,但即使一行与A400和G2匹配,U900也没有重复。

有没有一种很好的方法可以在大型数据集上检查这样的重复项?我无法创建临时保存数据的新表。到目前为止,我一直在尝试使用游标,但没有运气。

谢谢,如果您需要有关我的问题的更多信息,请与我们联系。

1 个答案:

答案 0 :(得分:0)

Oracle安装程序

CREATE TABLE table_name ( id, seq_num, seq_count, seq_id, a, b, c, d ) AS
SELECT 1,  1, 3, 'A400',  1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 2,  2, 3, 'A400',  0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 3,  3, 3, 'A400',  0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 4,  1, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 5,  2, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 6,  1, 3, 'G2',    1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 7,  2, 3, 'G2',    0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 8,  3, 3, 'G2',    0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 9,  1, 3, 'U900',  1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 10, 2, 3, 'U900',  2, 2, 1, 1 FROM DUAL UNION ALL
SELECT 11, 3, 3, 'U900',  5, 3, 8, 5 FROM DUAL;

<强>查询

SELECT  s.seq_id,
        t.seq_id AS matched_seq_id
FROM    table_name s
        INNER JOIN
        table_name t
        ON (    s.seq_num = t.seq_num 
            AND s.seq_count = t.seq_count
            AND s.seq_id   < t.seq_id
            AND s.a = t.a
            AND s.b = t.b
            AND s.c = t.c
            AND s.d = t.d )
GROUP BY
        t.seq_id,
        s.seq_id
HAVING  COUNT( DISTINCT t.seq_num ) = MAX( t.seq_count );

<强>结果:

SEQ_ID MATCHED_SEQ_ID
------ --------------
A400   G2