按天分组,在左连接中显示没有数据和复杂查询的天数

时间:2015-08-20 23:05:25

标签: sql database postgresql join group-by

我在 PostgreSQL 9.4.4

中有复杂的SQL查询
SELECT
  p.id,
  p.name,
  p.page_variant_id,
  p.variant_name,
  (
    SELECT COUNT(*) FROM page_views
    INNER JOIN unique_page_visits upv ON upv.id = page_views.unique_page_visit_id
    WHERE page_views.page_id = p.id AND upv.updated_at >= '2015-08-15' AND
          upv.updated_at <= '2015-08-22'
  ) as views_count,
  (
    SELECT COUNT(*) FROM unique_page_visits upv
    WHERE upv.page_id = p.id  AND upv.updated_at >= '2015-08-15' AND
          upv.updated_at <= '2015-08-22'
  ) as page_visits_count,
  (
    SELECT COUNT(*) FROM conversions
    INNER JOIN conversion_goals cg ON cg.id = conversions.conversion_goal_id
    INNER JOIN unique_page_visits upv ON upv.id = conversions.unique_page_visit_id
    WHERE cg.page_id = p.id  AND conversions.updated_at >= '2015-08-15' AND
          conversions.updated_at <= '2015-08-22' AND cg.name = 'popup'
  ) as conversions_count
FROM
  pages p
WHERE
  p.page_variant_id = '25'
ORDER BY
  p.id ASC

示例结果:

 id | name | page_variant_id | variant_name | views_count | page_visits_count | conversions_count 
----+------+-----------------+--------------+-------------+-------------------+-------------------
 73 | a    |              25 | Original     |           1 |                 1 |                 1
(1 row)

我不知道这个查询是否以最好的方式编写,但它确实有用 欢迎任何改进! - 删除SELECT子查询中的冗余,例如:

AND upv.updated_at >= '2015-08-15' AND upv.updated_at <= '2015-08-22'

问题是我必须按天分组结果。即使当天没有找到任何行,每一天都必须出现在结果中。

我可以重复使用this code(我略微修改了这个;归功于Erwin Brandstetter):

SELECT *
FROM  (SELECT generate_series('2015-08-15'::date
                            , '2015-08-22'::date
                            , '1 day'::interval)::date) AS d(day)
LEFT   JOIN (
   SELECT date_trunc('month', date_col)::date AS day
        , count(*) AS some_count
   FROM   tbl
   WHERE  date_col >= '2007-12-01'::date
   AND    date_col <= '2008-12-06'::date
-- AND    ... more conditions
   GROUP  BY 1
   ) t USING (day)
ORDER  BY 1;

主要问题是我需要LEFT JOIN字段created_at(强制转换为date)到表page_viewsconversionsunique_page_visits,而不是pages表格(SELECT区域中的主要查询,而不是子查询)。

的伪代码:

SELECT * 
FROM
    (SELECT generate_series('2015-08-15'::date
                          , '2015-08-22'::date
                          , '1 day'::interval)::date) AS d(day)

LEFT JOIN (
  SELECT day_from_subquery_not_from_pages::data AS day
  -- other stuff to return proper results AND conditions
) t USING(day)   

这甚至可能吗?

或者我可能只需将这一个大型查询拆分为子查询(我将有3个然后......)然后使用UNION加入结果?然后,我可以JOIN ON天从子查询...

实现这一目标的最佳方法是什么?

1 个答案:

答案 0 :(得分:1)

猜测缺少详细信息,此查询可能正是您要查找的内容:

WITH p AS (
   SELECT '2015-08-15'::date AS a, '2015-08-22'::date AS z  -- enter bounds once
        , id, name, page_variant_id, variant_name
   FROM   pages
   WHERE  page_variant_id = '25'   -- enter ID once
   )
SELECT p.id, p.name, p.page_variant_id, p.variant_name
     , day, v.views_count, pv.page_visits_count, c.conversions_count
FROM   p
     , LATERAL (SELECT day::date FROM generate_series(p.a, p.z, interval '1 day') day) d
LEFT   JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS views_count
   FROM                      p
   JOIN   page_views         pv  ON pv.page_id = p.id
   JOIN   unique_page_visits upv ON upv.id = pv.unique_page_visit_id
   WHERE  upv.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) v USING (day)
LEFT JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS page_visits_count
   FROM                      p
   JOIN   unique_page_visits upv ON upv.page_id = p.id
   WHERE  upv.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) pv USING (day)
LEFT JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS conversions_count
   FROM                      p
   JOIN   conversion_goals   cg  ON cg.page_id = p.id
   JOIN   conversions        c   ON c.conversion_goal_id = cg.id
   JOIN   unique_page_visits upv ON upv.id = c.unique_page_visit_id
   WHERE  cg.name = 'popup'
   AND    c.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) c USING (day)
ORDER  BY day;