如何将多个子查询优化为同一数据集

时间:2017-10-05 09:04:51

标签: sql postgresql

想象一下,我有一个类似下面的查询:

SELECT
  u.ID,
  ( SELECT 
      COUNT(*)
    FROM
      POSTS p
    WHERE
      p.USER_ID = u.ID
      AND p.TYPE = 1
  ) AS interesting_posts,
  ( SELECT 
      COUNT(*)
    FROM
      POSTS p
    WHERE
      p.USER_ID = u.ID
      AND p.TYPE = 2
  ) AS boring_posts,
  ( SELECT 
      COUNT(*)
    FROM
      COMMENTS c
    WHERE
      c.USER_ID = u.ID
      AND c.TYPE = 1
  ) AS interesting_comments,
  ( SELECT 
      COUNT(*)
    FROM
      COMMENTS c
    WHERE
      c.USER_ID = u.ID
      AND c.TYPE = 2
  ) AS boring_comments
FROM
  USERS u;

(希望它是正确的,因为我刚想出来并且没有测试过它)

我尝试计算用户有趣且无聊的帖子和评论的数量。

现在,这个查询的问题是我们在postscomments表上都有2次连续扫描,我想知道是否有办法避免这种情况?

我可能会LEFT JOINusers表格发布帖子和评论并进行一些汇总,但是在汇总之前它会生成很多行,我不确定是否会这样做#39} ;是一个很好的方式。

2 个答案:

答案 0 :(得分:3)

汇总帖子和评论,并将其加入到用户表格中。

select
  u.id as user_id,
  coaleasce(p.interesting, 0) as interesting_posts,
  coaleasce(p.boring, 0)      as boring_posts,
  coaleasce(c.interesting, 0) as interesting_comments,
  coaleasce(c.boring, 0)      as boring_comments
from users u
left join
(
  select
    user_id,
    count(case when type = 1 then 1 end) as interesting,
    count(case when type = 2 then 1 end) as boring
  from posts
  group by user_id
) p on p.user_id = u.id
left join
(
  select
    user_id,
    count(case when type = 1 then 1 end) as interesting,
    count(case when type = 2 then 1 end) as boring
  from comments
  group by user_id
) c on c.user_id = u.id;

答案 1 :(得分:0)

比较结果和执行计划(这里你扫描一次帖子):

with c as (
select distinct 
count(1) filter (where TYPE = 1) over (partition by USER_ID) interesting_posts
, count(1) filter (where TYPE = 2) over (partition by USER_ID) boring_posts
, USER_ID
)
, p as (select USER_ID,max(interesting_posts) interesting_posts, max(boring_posts) boring_posts from c)
SELECT
  u.ID, interesting_posts,boring_posts
  ,  ( SELECT 
      COUNT(*)
    FROM
      COMMENTS c
    WHERE
      c.USER_ID = u.ID
  ) AS comments
FROM
  USERS u
JOIN p on p.USER_ID = u.ID