SQL效率也许我应该使用Exist?

时间:2013-03-06 00:11:56

标签: sql postgresql

使用以下架构:

create table awards(
aid int primary key
, name varchar(100) not null );

create table institutions(
iid int primary key
, name varchar(100) not null );

create table winners(
aid int
, iid int
, year int
, filmname varchar(100)
, personname varchar(100)
, primary key (aid, iid, year)
, foreign key tid references awards(aid)
, foreign key cid references institutions(iid) );

我创建了以下查询:

SELECT nominees.personname as personname, awards.name as award, nominees.year as year 

FROM nominees, institutions, awards WHERE institutions.iid = nominees.iid and 
awards.aid = nominees.aid and personname is not null 

GROUP BY nominees.personname, awards.name, nominees.year 

HAVING ((awards.name, count(DISTINCT institutions.name)) in 
(SELECT awards.name as 
awards, count(DISTINCT institutions.name) 
FROM nominees, awards, institutions 
WHERE nominees.aid = awards.aid and nominees.iid = institutions.iid 
GROUP BY awards.name)) 

ORDER BY nominees.personname, awards.name;

此查询旨在查找在特定年份中每个机构提名该奖项的所有奖项。它基本上需要一个人,并计算给予他们单一奖励的机构数量,并将该价值与给予该奖励的机构的最大数量进行比较。

所需的输出应如下所示:

"personname"    "award" "year"

"Alexandre"     "score" "2011"
"Skyfall"       "song"  "2013"
"Tangled"       "song"  "2011"

这给出了我想要的套装但是我不确定以不同的方式做它是否更有效率。我试图让它与EXISTS合作,但我没有太多运气。

主要问题:有没有更有效的方法来进行此查询?

1 个答案:

答案 0 :(得分:3)

与复杂查询一样,我使用TDQD - 测试驱动的查询设计 - 分阶段解决问题。每个阶段都可以单独测试,并检查结果,确保您得到正确的答案。

我注意到你向我们展示了三张桌子;您的查询使用其中两个,但提到第四个nominees。我假设winnersnominees相同,因为您向我们提供了该架构,并询问了在某一年内每个提供奖项的机构中谁获奖。

第一阶段:有多少院校在一年内颁发了特定奖项?

SELECT aid, year, COUNT(*) AS num_awards
  FROM winners
 GROUP BY aid, year;

第二阶段:一个人在一年内获得特定奖励的次数是多少次?

SELECT aid, year, personname, COUNT(*) AS num_person_awards
  FROM winners
 GROUP BY aid, year, personname;

第3阶段:两个计数相同的行?

SELECT n.aid, n.year, w.personname
  FROM (SELECT aid, year, COUNT(*) AS num_awards
          FROM winners
         GROUP BY aid, year
       ) AS n
  JOIN (SELECT aid, year, personname, COUNT(*) AS num_person_awards
          FROM winners
         GROUP BY aid, year, personname
       ) AS w
    ON n.aid = w.aid AND n.year = w.year AND n.num_awards = w.num_person_awards

第4阶段:将奖励ID替换为结果集

中的奖励名称
SELECT a.name AS awardname, a.year, w.personname
  FROM (SELECT aid, year, COUNT(*) AS num_awards
          FROM winners
         GROUP BY aid, year
       ) AS n
  JOIN (SELECT aid, year, personname, COUNT(*) AS num_person_awards
          FROM winners
         GROUP BY aid, year, personname
       ) AS w
    ON n.aid = w.aid AND n.year = w.year AND n.num_awards = w.num_person_awards
  JOIN awards AS a
    ON a.aid = n.aid;

我没有尝试过这是否比你的查询更快,但它看起来更简单,所以我认为它有合理的机会更快地工作。


这是我格式化查询的方式:

SELECT nominees.personname AS personname, awards.name AS award, nominees.year AS year
  FROM nominees
  JOIN institutions ON institutions.iid = nominees.iid
  JOIN awards ON awards.aid = nominees.aid
 WHERE personname IS NOT NULL 
 GROUP BY nominees.personname, awards.name, nominees.year 
HAVING (awards.name, COUNT(DISTINCT institutions.name) IN 
            (SELECT awards.name AS awards, COUNT(DISTINCT institutions.name) 
               FROM nominees, awards, institutions 
              WHERE nominees.aid = awards.aid and nominees.iid = institutions.iid 
              GROUP BY awards.name)
 ORDER BY nominees.personname, awards.name;