在同一查询中同时查询每日汇总和每月汇总?

时间:2018-11-14 16:25:04

标签: sql google-bigquery

我想按subreddit和day计算每日唯一活跃用户的数量,然后将这些计数按组和月份汇总到每月唯一活跃用户中。单独进行每个操作非常简单,但是当我尝试在一个组合查询中进行操作时,它告诉我需要在第二级子查询中按date_month_day进行分组,这将导致month_unique_users与daily_unique_uauthors相同。错误:GROUP BY列表[invalidQuery]中没有表达式'date_month_day'。

这是我到目前为止的查询:

SELECT * FROM
          (
              SELECT *,
                 (daily_unique_authors/monthly_unique_authors) * 1.0 AS ratio,
                 ROW_NUMBER() OVER (PARTITION BY date_month_day ORDER BY ratio DESC) rank 
                 FROM 
                     (
                      SELECT subreddit,
                            date_month_day,
                            daily_unique_authors,
                            SUM(daily_unique_authors) AS monthly_unique_authors,
                            LEFT(date_month_day, 7) as date_month
                            FROM 
                                  (
                                    SELECT subreddit,
                                           LEFT(DATE(SEC_TO_TIMESTAMP(created_utc)), 10) as date_month_day,
                                           COUNT(UNIQUE(author)) as daily_unique_authors
                                    FROM TABLE_QUERY([fh-bigquery:reddit_comments], "table_id CONTAINS \'20\' AND LENGTH(table_id)<8")
                                    GROUP EACH BY subreddit, date_month_day
                                  )
                            GROUP EACH BY subreddit, date_month))

     WHERE rank <= 100
     ORDER BY date_month ASC

理想情况下,最终输出应为:

subreddit date_month date_month_day daily_unique_users         monthly_unique_users ratio  

 1 google 2005-12    2005-12-29                       77                    600     0.128     
 2 google 2005-12    2005-12-31                       52                     600     0.866    
 3 google 2005-12    2005-12-28                       81                     600     0.135    
 4 google 2005-12    2005-12-27                       73                     600     0.121     

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
SELECT * FROM (
  SELECT *,
    ROW_NUMBER() OVER(PARTITION BY date_month_day ORDER BY ratio DESC) rank 
  FROM (
    SELECT 
      daily.subreddit subreddit, 
      daily.date_month date_month, 
      date_month_day, 
      daily_unique_authors, 
      monthly_unique_authors,
      1.0 * daily_unique_authors / monthly_unique_authors AS ratio
    FROM (
      SELECT subreddit,
        DATE(TIMESTAMP_SECONDS(created_utc)) AS date_month_day,
        FORMAT_DATE('%Y-%m', DATE(TIMESTAMP_SECONDS(created_utc))) AS date_month,
        COUNT(DISTINCT author) AS daily_unique_authors
      FROM `fh-bigquery.reddit_comments.2018*`
      GROUP BY subreddit, date_month_day, date_month
    ) daily
    JOIN (
      SELECT subreddit,
        FORMAT_DATE('%Y-%m', DATE(TIMESTAMP_SECONDS(created_utc))) AS date_month,
        COUNT(DISTINCT author) AS monthly_unique_authors
      FROM `fh-bigquery.reddit_comments.2018*`
      GROUP BY subreddit, date_month
    ) monthly 
    ON daily.subreddit = monthly.subreddit
    AND daily.date_month = monthly.date_month
  )
)
WHERE rank <= 100
ORDER BY date_month

注意:我试图尽可能保留问题中的原始逻辑和结构-因此OP能够将答案与问题相关联,并在需要时进行进一步调整:o)