我有一张这样的表
行时间viewCount
1 00:00:00 31
2 00:00:01 44
3 00:00:02 78
4 00:00:03 71
5 00:00:04 72
6 00:00:05 73
7 00:00:06 64
8 00:00:07 70
我想把它汇总成
行时间viewCount
1 00:00:00 31
2 00:15:00 445
3 00:30:00 700
4 00:45:00 500
5 01:00:04 121
6 01:15:00 475
。
。
请帮忙。提前致谢
答案 0 :(得分:1)
以下是BigQuery Standard SQL
注意:您提供了数据的简化示例,下面是它 - 所以它不是每15分钟聚合一次,而是使用每2秒聚合。这是为了让您能够轻松地测试/玩它。通过在3个位置将SECOND
更改为MINUTE
并在3个位置将2
更改为15
,可以轻松将其调整为15分钟。此示例还使用TIME
数据类型作为时间字段,因为它在您的示例中仅限于24小时 - 您的实际数据中很可能只有DATETIME
或TIMESTAMP
。在这种情况下,您还需要将所有TIME_*
函数替换为相应的DATETIME_*
或TIMESTAMP_*
函数
所以,最后 - 查询是:
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIME '00:00:00' time, 31 viewCount UNION ALL
SELECT TIME '00:00:01', 44 UNION ALL
SELECT TIME '00:00:02', 78 UNION ALL
SELECT TIME '00:00:03', 71 UNION ALL
SELECT TIME '00:00:04', 72 UNION ALL
SELECT TIME '00:00:05', 73 UNION ALL
SELECT TIME '00:00:06', 64 UNION ALL
SELECT TIME '00:00:07', 70
),
period AS (
SELECT MIN(time) min_time, MAX(time) max_time, TIME_DIFF(MAX(time), MIN(time), SECOND) diff
FROM `project.dataset.table`
),
checkpoints AS (
SELECT TIME_ADD(min_time, INTERVAL step SECOND) start_time, TIME_ADD(min_time, INTERVAL step + 2 SECOND) end_time
FROM period, UNNEST(GENERATE_ARRAY(0, diff + 2, 2)) step
)
SELECT start_time time, SUM(viewCount) viewCount
FROM checkpoints c
JOIN `project.dataset.table` t
ON t.time >= c.start_time AND t.time < c.end_time
GROUP BY start_time
ORDER BY start_time, time
结果是:
Row time viewCount
1 00:00:00 75
2 00:00:02 149
3 00:00:04 145
4 00:00:06 134
答案 1 :(得分:1)
假设您实际上有一个TIMESTAMP列,您可以使用这样的方法:
#standardSQL
SELECT
TIMESTAMP_SECONDS(
UNIX_SECONDS(timestamp) -
MOD(UNIX_SECONDS(timestamp), 15 * 60)
) AS time,
SUM(viewCount) AS viewCount
FROM `project.dataset.table`
GROUP BY time;
它依赖于Unix秒的转换,以便计算15分钟的间隔。请注意,与Mikhail的解决方案不同,它不会产生零空间15分钟的行,但是(不清楚这对你来说是否重要)。