我有一张这样的事件表:
date event_category event_planner
2019-09-22T00:00:00 soccer_night Marcus
2019-09-25T00:00:00 comedy_night John
2019-09-28T00:00:00 dance_party John
2019-10-02T00:00:00 soccer_night Marcus
这里的想法是为每个计划者获取日期之间的差异的滚动平均值。
到目前为止,我有按天数划分的每个计划者的工作天数,分别为:
DATE_DIFF(SAFE_CAST(date AS date),LAG(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner ORDER BY date), day) AS result
我期望的是这样的:
date event_category event_planner rolling_avg
2019-09-22T00:00:00 soccer_night Marcus 0
2019-09-25T00:00:00 comedy_night John 0
2019-09-28T00:00:00 comedy_night John 3
2019-10-02T00:00:00 soccer_night Marcus 10
2019-10-10T00:00:00 comedy_night John 7
答案 0 :(得分:2)
以下是用于BigQuery标准SQL
#standardSQL
SELECT * EXCEPT(day, diff), IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day), 0) rolling_avg
FROM (
SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
FROM (
SELECT *, SAFE_CAST(date AS DATE) AS day
FROM `project.dataset.table`
)
)
如果要应用于问题的样本数据
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2019-09-22T00:00:00' date, 'soccer_night' event_category, 'Marcus' event_planner UNION ALL
SELECT '2019-09-25T00:00:00', 'comedy_night', 'John' UNION ALL
SELECT '2019-09-28T00:00:00', 'comedy_night', 'John' UNION ALL
SELECT '2019-10-02T00:00:00', 'soccer_night', 'Marcus' UNION ALL
SELECT '2019-10-10T00:00:00', 'comedy_night', 'John'
)
结果是
Row date event_category event_planner rolling_avg
1 2019-09-22 00:00:00 UTC soccer_night Marcus 0
2 2019-09-25 00:00:00 UTC comedy_night John 0
3 2019-09-28 00:00:00 UTC comedy_night John 3.0
4 2019-10-02 00:00:00 UTC soccer_night Marcus 10.0
5 2019-10-10 00:00:00 UTC comedy_night John 7.5
我应该如何修改以使用同一计划者的同一类型的最近三个事件的平均值?
#standardSQL
SELECT * EXCEPT(day, diff),
IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 0) rolling_avg
FROM (
SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
FROM (
SELECT *, SAFE_CAST(date AS DATE) AS day
FROM `project.dataset.table`
)
)
答案 1 :(得分:1)
您可以使用lag()
计算子查询中的最后日期,然后在外部查询中进行滚动平均:
select
t.*,
avg(date_diff(date, lag_date, day)) over(
partition by event_category, event_planner order by date
) rolling_avg
from (
select
t.*
lag(date) over(
partition by event_category, event_planner order by date
) lag_date
from mytable t
) t
答案 2 :(得分:0)
对于平均值,您可以使用:
(DATE_DIFF(MIN(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner),
SAFE_CAST(date AS date),
day
) /
NULLIF(COUNT(*) OVER (PARTITION BY event_category, event_planner) - 1, 0)
) AS result