如何计算BigQuery在事件之间的天差滚动平均值?

时间:2019-11-29 16:39:22

标签: sql google-bigquery

我有一张这样的事件表:

date                  event_category     event_planner

2019-09-22T00:00:00   soccer_night       Marcus
2019-09-25T00:00:00   comedy_night       John
2019-09-28T00:00:00   dance_party        John
2019-10-02T00:00:00   soccer_night       Marcus

这里的想法是为每个计划者获取日期之间的差异的滚动平均值。 到目前为止,我有按天数划分的每个计划者的工作天数,分别为: DATE_DIFF(SAFE_CAST(date AS date),LAG(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner ORDER BY date), day) AS result

我期望的是这样的:

date                  event_category     event_planner     rolling_avg

2019-09-22T00:00:00   soccer_night       Marcus            0
2019-09-25T00:00:00   comedy_night       John              0
2019-09-28T00:00:00   comedy_night       John              3
2019-10-02T00:00:00   soccer_night       Marcus            10
2019-10-10T00:00:00   comedy_night       John              7

3 个答案:

答案 0 :(得分:2)

以下是用于BigQuery标准SQL

#standardSQL
SELECT * EXCEPT(day, diff), IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day), 0) rolling_avg
FROM (
  SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
  FROM (
    SELECT *, SAFE_CAST(date AS DATE) AS day 
    FROM `project.dataset.table`
  )
)

如果要应用于问题的样本数据

WITH `project.dataset.table` AS (
  SELECT TIMESTAMP '2019-09-22T00:00:00' date, 'soccer_night' event_category, 'Marcus' event_planner UNION ALL
  SELECT '2019-09-25T00:00:00', 'comedy_night', 'John' UNION ALL
  SELECT '2019-09-28T00:00:00', 'comedy_night', 'John' UNION ALL
  SELECT '2019-10-02T00:00:00', 'soccer_night', 'Marcus' UNION ALL
  SELECT '2019-10-10T00:00:00', 'comedy_night', 'John' 
)

结果是

Row date                    event_category  event_planner   rolling_avg  
1   2019-09-22 00:00:00 UTC soccer_night    Marcus          0    
2   2019-09-25 00:00:00 UTC comedy_night    John            0    
3   2019-09-28 00:00:00 UTC comedy_night    John            3.0  
4   2019-10-02 00:00:00 UTC soccer_night    Marcus          10.0     
5   2019-10-10 00:00:00 UTC comedy_night    John            7.5    
  

我应该如何修改以使用同一计划者的同一类型的最近三个事件的平均值?

#standardSQL
SELECT * EXCEPT(day, diff), 
  IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 0) rolling_avg
FROM (
  SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
  FROM (
    SELECT *, SAFE_CAST(date AS DATE) AS day 
    FROM `project.dataset.table`
  )
)

答案 1 :(得分:1)

您可以使用lag()计算子查询中的最后日期,然后在外部查询中进行滚动平均:

select
    t.*,
    avg(date_diff(date, lag_date, day)) over(
        partition by event_category, event_planner order by date
    ) rolling_avg
from (
    select
        t.*
        lag(date) over(
            partition by event_category, event_planner order by date
        ) lag_date
    from mytable t
) t

答案 2 :(得分:0)

对于平均值,您可以使用:

(DATE_DIFF(MIN(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner),
           SAFE_CAST(date AS date),
           day
          ) / 
 NULLIF(COUNT(*) OVER (PARTITION BY event_category, event_planner) - 1, 0)
) AS result