将结果限制为仅一个值仅出现一次的行

时间:2012-02-01 20:59:21

标签: sql oracle aggregate having

我的查询比这里的示例更复杂,但是只需要返回某个字段在数据集中不会出现多次的行。

ACTIVITY_SK      STUDY_ACTIVITY_SK
100              200
101              201
102              200
100              203

在此示例中,我不希望返回任何ACTIVITY_SK为100的记录,因为ACTIVITY_SK在数据集中出现两次。

数据是一个映射表,并且在许多联接中使用,但是这样的多个记录意味着数据质量问题,所以我需要简单地从结果中删除它们,而不是在其他地方导致错误的连接。

SELECT 
   A.ACTIVITY_SK,
   A.STATUS,
   B.STUDY_ACTIVITY_SK,
   B.NAME,
   B.PROJECT
 FROM
   ACTIVITY A,
   PROJECT B
 WHERE 
   A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK

我曾经尝试过这样的事情:

SELECT 
   A.ACTIVITY_SK,
   A.STATUS,
   B.STUDY_ACTIVITY_SK,
   B.NAME,
   B.PROJECT
 FROM
   ACTIVITY A,
   PROJECT B
 WHERE 
   A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
 WHERE A.ACTIVITY_SK NOT IN
 (

  SELECT 
     A.ACTIVITY_SK,
     COUNT(*)
    FROM
      ACTIVITY A,
      PROJECT B
    WHERE 
    A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
    GROUP BY A.ACTIVITY_SK
    HAVING COUNT(*) > 1

 )

但必须有一种较便宜的方式来做这件事......

2 个答案:

答案 0 :(得分:5)

这样的事情可能会有点“便宜”:

SELECT
   A.ACTIVITY_SK,
   A.STATUS,
   B.STUDY_ACTIVITY_SK,
   B.NAME,
   B.PROJECT
PROJECT B INNER JOIN
   (SELECT 
       ACTIVITY_SK,
       MIN(STATUS) STATUS,
    FROM
      ACTIVITY
    GROUP BY ACTIVITY_SK
    HAVING COUNT(ACTIVITY_SK) = 1 ) A
ON A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK

答案 1 :(得分:1)

另一种选择:

select * from (
  SELECT 
     A.ACTIVITY_SK,
     A.STATUS,
     B.STUDY_ACTIVITY_SK,
     B.NAME,
     B.PROJECT,
     count(distinct a.pk) over (partition by a.activity_sk) AS c
   FROM
     ACTIVITY A,
     PROJECT B
   WHERE 
     A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
) where c = 1;

(其中a.pk是指ACTIVITY表中的唯一标识符)