在SQL中避免重复子查询的最佳方法

时间:2016-12-30 19:33:46

标签: sql postgresql query-optimization

我有一个简单的两列表。例如,我们可以使用以下内容来构建数据:

CREATE TABLE Duplicates
  (assignmentid varchar(5), questionid varchar(5));

INSERT INTO Duplicates
  (assignmentid, questionid)
VALUES
  ('aaaaa', '11111'),
  ('aaaaa', '22222'),
  ('bbbbb', '22222'),
  ('bbbbb', '33333'),
  ('bbbbb', '33333');

有两行相同。还有一个问题出现在多个作业中。后者是一个有效的场景,我试图查询属于多个作业的所有问题。所以我想要的输出是:

  aaaaa, 22222
  bbbbb, 22222

我能够得到这个结果:

SELECT main.questionid, sub.assignmentid 
FROM (
   SELECT questionid, count(assignmentid) AS AssignmentCount 
   FROM ( 
      SELECT DISTINCT questionid, assignmentid 
      FROM Duplicates
   ) sub 
   GROUP BY questionid
   HAVING AssignmentCount > 1
) main
INNER JOIN (
     SELECT DISTINCT questionid, assignmentid 
     FROM Duplicates
) sub ON main.questionid = sub.questionid;

正如您所看到的,DISTINCT子查询重复了两次。我可以通过使用WITH命令来避免这种情况,但我的理解是,这并不一定意味着子查询只会被执行一次。所以现在我在StackOverflow中,询问是否有人知道更有效的方式来运行此查询。

2 个答案:

答案 0 :(得分:0)

只需使用窗口功能。一种方法是将答案数与不同答案数进行比较:

select distinct answerid, questionid
from (select d.*,
             count(distinct answerid) over (partition by questionid) as cntd,
             count(*) over (partition by questionid) as cnt
      from duplicates d
     ) d
where cntd <> cnt;

编辑:

您可以在没有count(distinct)的情况下执行此操作,但需要一个额外的子查询:

select distinct answerid, questionid
from (select d.*,
             count((seqnum = 1)::int) over (partition by questionid) as cntd,
             count(*) over (partition by questionid) as cnt
      from (select d.*,
                   row_number() over (partition by questionid, answerid order by questionid) as seqnum
            from duplicates d
           ) d
     ) d
where cntd <> cnt;

这使用行号进行不同的计算。

答案 1 :(得分:0)

您可以将其简化为:

select *
from duplicates
where questionid in (select questionid
                     from duplicates
                     group by questionid
                     having count(distinct assignmentid) > 1);

子查询返回分配给多个assignmentid的所有questionid。