我是一个BigQuery和SQL新手,他们正在继续解决分组问题。在BigQuery中使用标准SQL,我希望按X天分组数据。这是一个数据表:
event_id | url | timestamp
-----------------------------------------------------------
xx a.html 2016-10-18 15:55:16 UTC
xx a.html 2016-10-19 16:68:55 UTC
xx a.html 2016-10-25 20:55:57 UTC
yy b.html 2016-10-18 15:58:09 UTC
yy b.html 2016-10-18 08:32:43 UTC
zz a.html 2016-10-20 04:44:22 UTC
zz c.html 2016-10-21 02:12:34 UTC
我想从给定日期开始,以X天的间隔计算每个网址上发生的每个事件的数量。例如:我如何在3天的时间间隔内对此进行分组,我的第一个时间间隔是在2016-10-18 00:00:00 UTC开始的?另外,我可以将间隔的第3天分配给每一行吗?示例输出:
event_id | url | count | 3dayIntervalLabel
-----------------------------------------------------------
xx a.html 2 2016-10-20 --> [18th thru 20th]
yy b.html 2 2016-10-20
zz a.html 1 2016-10-20
zz c.html 1 2016-10-23 --> [21th thru 23th]
xx a.html 1 2016-10-26 --> [24th thru 26th]
我添加了三个注释来阐明3dayIntervalLabel值。
一般情况下,我希望解决:从日期Y开始按X天的间隔分组,并使用每个间隔的最终日期标记间隔。
如果需要进一步澄清,请告诉我。
如果您有兴趣,我还会在StackOverflow上提出类似的问题(并得到答案),使用滚动窗口对此数据进行分组:Line 313 in dashboard.js和initial question。
谢谢!
答案 0 :(得分:3)
WITH dailyAggregations AS (
SELECT
DATE(ts) AS day,
url,
event_id,
UNIX_SECONDS(TIMESTAMP(DATE(ts))) AS sec,
COUNT(1) AS events
FROM yourTable
GROUP BY day, url, event_id, sec
),
calendar AS (
SELECT day, DATE_ADD(day, INTERVAL 2 DAY) AS endday
FROM UNNEST (GENERATE_DATE_ARRAY('2016-10-18', '2016-11-06', INTERVAL 3 DAY)) AS day
)
SELECT
event_id,
url,
SUM(events) AS `count`,
c.endday AS `ThreedayIntervalLabel`
FROM calendar AS c
JOIN dailyAggregations AS a
ON a.day BETWEEN c.day AND c.endday
GROUP BY endday, url, event_id
答案 1 :(得分:0)
如果您有基准日期,那么就是这样:
select floor(date_diff(date(timestamp), date '2016-10-18', day) / 3) as days,
count(*)
from t
group by days
order by days;