我有一个简单的两列表。例如,我们可以使用以下内容来构建数据:
CREATE TABLE Duplicates
(assignmentid varchar(5), questionid varchar(5));
INSERT INTO Duplicates
(assignmentid, questionid)
VALUES
('aaaaa', '11111'),
('aaaaa', '22222'),
('bbbbb', '22222'),
('bbbbb', '33333'),
('bbbbb', '33333');
有两行相同。还有一个问题出现在多个作业中。后者是一个有效的场景,我试图查询属于多个作业的所有问题。所以我想要的输出是:
aaaaa, 22222
bbbbb, 22222
我能够得到这个结果:
SELECT main.questionid, sub.assignmentid
FROM (
SELECT questionid, count(assignmentid) AS AssignmentCount
FROM (
SELECT DISTINCT questionid, assignmentid
FROM Duplicates
) sub
GROUP BY questionid
HAVING AssignmentCount > 1
) main
INNER JOIN (
SELECT DISTINCT questionid, assignmentid
FROM Duplicates
) sub ON main.questionid = sub.questionid;
正如您所看到的,DISTINCT子查询重复了两次。我可以通过使用WITH命令来避免这种情况,但我的理解是,这并不一定意味着子查询只会被执行一次。所以现在我在StackOverflow中,询问是否有人知道更有效的方式来运行此查询。
答案 0 :(得分:0)
只需使用窗口功能。一种方法是将答案数与不同答案数进行比较:
select distinct answerid, questionid
from (select d.*,
count(distinct answerid) over (partition by questionid) as cntd,
count(*) over (partition by questionid) as cnt
from duplicates d
) d
where cntd <> cnt;
编辑:
您可以在没有count(distinct)
的情况下执行此操作,但需要一个额外的子查询:
select distinct answerid, questionid
from (select d.*,
count((seqnum = 1)::int) over (partition by questionid) as cntd,
count(*) over (partition by questionid) as cnt
from (select d.*,
row_number() over (partition by questionid, answerid order by questionid) as seqnum
from duplicates d
) d
) d
where cntd <> cnt;
这使用行号进行不同的计算。
答案 1 :(得分:0)
您可以将其简化为:
select *
from duplicates
where questionid in (select questionid
from duplicates
group by questionid
having count(distinct assignmentid) > 1);
子查询返回分配给多个assignmentid的所有questionid。