使用postgresql在周​​日期间应用窗口计数

时间:2018-03-22 02:18:05

标签: sql postgresql window-functions

我有一张评论表,看起来像这样:

enter image description here

我试图生成一个返回4列的结果:

  1. 周,从星期一开始(yyyy-mm-dd)
  2. 本周回顾的电影数量
  3. 过去30天内审核的不同电影的数量
  4. 在过去30天内有3次或以上评论的不同电影的数量
  5. 我已经完成了1,2和3(我想),但我无法弄清楚如何返回4。

    此查询为我提供了1,2和3:

    WITH week_dates AS (
     SELECT
      date(date_trunc('week', f.updated_at::date)) AS week_date,
      count(*) Movie_Reviews
     FROM Reviews f
     WHERE submitted = TRUE AND about_type = 'Movie'
     GROUP BY week_date
        )
    SELECT
    wd.*, 
     (SELECT 
       count(DISTINCT ff.about_id) Fresh_Reviews
       FROM Reviews ff
       WHERE ff.submitted = TRUE
       AND ff.about_type = 'Movie' -- reviewed within last 30 days
       AND ff.updated_at <= wd.week_date
       AND ff.updated_at > wd.week_date - INTERVAL '30 days'
     ) Freshly_Reviewed
    FROM week_dates wd
    ORDER BY wd.week_date ASC
    

    在过去30天内,有任何帮助会返回包含3个或更多提交评论的不同电影的数量吗?

1 个答案:

答案 0 :(得分:1)

WITH
-- the reviews CTE is here just in place of your reviews table
reviews AS (
  SELECT
    *
  FROM
    (
      VALUES
        ('2014-08-02', 'Movie', True, 'Modern Times'),
        ('2016-10-21', 'Movie', True, 'Enter the Matrix'),
        ('2016-10-22', 'Movie', True, 'Enter the Matrix'),
        ('2016-10-23', 'Movie', True, 'Enter the Matrix'),
        ('2016-11-01', 'Movie', True, 'Citizen Kane'),
        ('2016-11-02', 'Movie', True, 'Citizen Kane'),
        ('2016-11-02', 'Movie', True, 'Citizen Kane'),
        ('2016-11-10', 'Movie', True, 'Blade Runner'),
        ('2016-11-17', 'Album', False, 'The Chronic'),
        ('2018-01-02', 'Movie', True, 'Citizen Kane'),
        ('2018-02-01', 'Movie', True, 'Conquest of Paradise'),
        ('2018-02-15', 'Movie', True, 'Modern Times'),
        ('2018-02-27', 'Movie', True, 'Modern Times'),
        ('2018-03-01', 'Movie', True, 'Citizen Kane'),
        ('2018-03-01', 'Movie', True, 'Modern Times'),
        ('2018-03-02', 'Movie', True, 'Wolf from Wall Street'),
        ('2018-03-02', 'Album', False, 'The Chronic'),
        ('2018-03-03', 'Movie', True, 'Wolf from Wall Street'),
        ('2018-03-12', 'Movie', True, 'Into the Wild')
    ) AS t(updated_at, about_type, submitted, about_id)
  WHERE
    submitted = TRUE
    AND about_type = 'Movie'
),
-- prepare weeks and their movie counts (1)
weeks AS (
  SELECT
    date_trunc('week', updated_at::DATE) AS week_date,
    count(*) AS count_this_week
  FROM reviews
  GROUP BY week_date
)
SELECT
  week_date,
  count_this_week,
  counts.*,
  count_prev_30_distinct_at_least_3.*
FROM
  weeks AS w
  -- lateral join allows us to use the current row of the weeks table
  -- basically as a nested subquery, but more efficiently
  -- similar to nested loop in python for example
  LEFT JOIN LATERAL (
    SELECT
      count(*) AS count_prev_30_all,
      count(DISTINCT r2.about_id) AS count_prev_30_distinct
    FROM
      reviews AS r2
    WHERE
      r2.updated_at::DATE BETWEEN w.week_date - INTERVAL '30 days' AND w.week_date
  ) AS counts ON TRUE
  -- and another just for the (4); the code could be rewritten to use just
  -- lateral join with a bit more effort
  LEFT JOIN LATERAL
  (
    SELECT count(*) AS count_prev_30_distinct_at_least_3
    FROM
      (
        SELECT
          r3.about_id,
          count(*) AS count
        FROM reviews AS r3
        WHERE r3.updated_at :: DATE BETWEEN w.week_date - INTERVAL '30 days' AND w.week_date
        GROUP BY r3.about_id
      ) AS hlp
    WHERE count >= 3
  ) AS count_prev_30_distinct_at_least_3 ON TRUE
ORDER BY week_date;