我的查询比这里的示例更复杂,但是只需要返回某个字段在数据集中不会出现多次的行。
ACTIVITY_SK STUDY_ACTIVITY_SK
100 200
101 201
102 200
100 203
在此示例中,我不希望返回任何ACTIVITY_SK
为100的记录,因为ACTIVITY_SK
在数据集中出现两次。
数据是一个映射表,并且在许多联接中使用,但是这样的多个记录意味着数据质量问题,所以我需要简单地从结果中删除它们,而不是在其他地方导致错误的连接。
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
我曾经尝试过这样的事情:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
WHERE A.ACTIVITY_SK NOT IN
(
SELECT
A.ACTIVITY_SK,
COUNT(*)
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
GROUP BY A.ACTIVITY_SK
HAVING COUNT(*) > 1
)
但必须有一种较便宜的方式来做这件事......
答案 0 :(得分:5)
这样的事情可能会有点“便宜”:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
PROJECT B INNER JOIN
(SELECT
ACTIVITY_SK,
MIN(STATUS) STATUS,
FROM
ACTIVITY
GROUP BY ACTIVITY_SK
HAVING COUNT(ACTIVITY_SK) = 1 ) A
ON A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
答案 1 :(得分:1)
另一种选择:
select * from (
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT,
count(distinct a.pk) over (partition by a.activity_sk) AS c
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
) where c = 1;
(其中a.pk
是指ACTIVITY表中的唯一标识符)