如何有效地提取仅包含SQL中具有重复元素的行的子表?

时间:2019-06-05 13:25:48

标签: sql oracle duplicates

主要任务是从现有表中获取一个子表(如果这不是一个正确的词,则表示歉意),该表仅保留感兴趣的几行。本质上,感兴趣的行是具有这样的元素的任何这样的行,该元素在任何其他行的任何其他元素中具有相同的值。

任何最好的解决方法或解释都会很有帮助。

我曾考虑过执行查询以检查每一行中的每个元素,然后简单地将所有查询结果进行并集。

这是我尝试的基础,尽管可能效率不高。请注意,共有3列,而我实际上仅检查2列( PARTICIPANT_1 PARTICIPANT_2 )中的重复值。

SELECT * FROM 
(
    team_table
    )
WHERE PARTICIPANT_2 in (SELECT PARTICIPANT_2
                FROM
                (
                    select startdate, PARTICIPANT_1, PARTICIPANT_2 
                    from team_table              
                )
                GROUP BY PARTICIPANT_2 
                HAVING COUNT(distinct PARTICIPANT_1) > 1
               )

UNION
SELECT * FROM 
(
    team_table
    )
WHERE PARTICIPANT_1 in (SELECT PARTICIPANT_1
                FROM
                (
                    select startdate, PARTICIPANT_1, PARTICIPANT_2 
                    from team_table              
                )
                GROUP BY PARTICIPANT_1 
                HAVING COUNT(distinct PARTICIPANT_2) > 1
               )

对于示例表:

startdate PARTICIPANT_1 PARTICIPANT_2
1-1-19      A               B
1-1-19      A               C
1-1-19      C               D
1-1-19      Q               R
1-1-19      S               T
1-1-19      U               V

由于A和C是重复元素,应该产生以下内容

startdate PARTICIPANT_1 PARTICIPANT_2
1-1-19      A               B
1-1-19      A               C
1-1-19      C               D

2 个答案:

答案 0 :(得分:3)

我认为这是您所需要的:

SELECT * FROM team_table t1
WHERE exists (SELECT 1 from team_table t2
               WHERE t1.startdate = t2.startdate -- don't know if you need this
                 -- Get all rows with duplicate values:
                 AND (t2.PARTICIPANT_1 IN (t1.PARTICIPANT_1, t1.PARTICIPANT_2)
                   OR t2.PARTICIPANT_2 IN (t1.PARTICIPANT_1, t1.PARTICIPANT_2))
                 -- Exclude the record itself:
                 AND (t1.PARTICIPANT_1 != t2.PARTICIPANT_1
                   OR t1.PARTICIPANT_2 != t2.PARTICIPANT_2))

答案 1 :(得分:1)

如果您具有唯一的ID列,则可以使用:

select tt.*
from team_table tt
where exists (select 1
              from team_table tt2
              where (tt.participant_1 in (tt2.participant_1, tt2.participant_2) or
                     tt.participant_2 in (tt2.participant_1, tt2.participant_2)
                    ) and
                    tt2.id <> tt.id
             );

如果您没有,则可以实际生成一个:

with tt as (
      select tt.*,
             row_number() over (partition by participant_1, participant_2, start_date) as seqnum
      from test_table tt
     )
select tt.*
from team_table tt
where exists (select 1
              from team_table tt2
              where (tt.participant_1 in (tt2.participant_1, tt2.participant_2) or
                     tt.participant_2 in (tt2.participant_1, tt2.participant_2)
                    ) and
                    tt2.seqnum <> tt.seqnum
             );