我无法尝试对连续的行进行分组(按时间戳排序),时间戳之间的差异小于60秒。
这是一个示例表:
user video timestamp time_diff
---- -------- ----------------------- ---------
Bob balldrop 2017-06-01 06:00:00 UTC null
Bob balldrop 2017-06-01 06:00:10 UTC -10
Bob balldrop 2017-06-01 06:00:30 UTC -20
Bob balldrop 2017-06-01 06:00:45 UTC -15
Bob balldrop 2017-06-01 06:04:00 UTC -195
Bob balldrop 2017-06-01 06:04:30 UTC -30
Bob bounce 2017-06-01 06:05:00 UTC null
Bob bounce 2017-06-01 06:05:20 UTC -20
期望的结果:
user video timestamp group
---- -------- ----------------------- ---------
Bob balldrop 2017-06-01 06:00:00 UTC 1
Bob balldrop 2017-06-01 06:00:10 UTC 1
Bob balldrop 2017-06-01 06:00:30 UTC 1
Bob balldrop 2017-06-01 06:00:45 UTC 1
Bob balldrop 2017-06-01 06:04:00 UTC 2
Bob balldrop 2017-06-01 06:04:30 UTC 2
Bob bounce 2017-06-01 06:05:00 UTC 3
Bob bounce 2017-06-01 06:05:20 UTC 3
答案 0 :(得分:2)
for BigQuery Standard SQL - 使用以下内容:
#standardSQL
WITH data AS (
SELECT 'Bob' AS user, 'balldrop' AS video, TIMESTAMP '2017-06-01 06:00:00 UTC' AS ts UNION ALL
SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:00:10 UTC' UNION ALL
SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:00:30 UTC' UNION ALL
SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:00:45 UTC' UNION ALL
SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:04:00 UTC' UNION ALL
SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:04:30 UTC' UNION ALL
SELECT 'Bob', 'bounce', TIMESTAMP '2017-06-01 06:05:00 UTC' UNION ALL
SELECT 'Bob', 'bounce', TIMESTAMP '2017-06-01 06:05:20 UTC'
)
SELECT
user, video, ts,
SUM(diff) OVER(PARTITION BY user ORDER BY ts) AS group_number
FROM (
SELECT
user, video, ts,
IF(TIMESTAMP_DIFF(ts, LAG(ts) OVER(PARTITION BY user, video ORDER BY ts), SECOND) < 60, 0, 1) AS diff
FROM data
)
-- ORDER BY ts
目前尚不清楚您希望如何为不同用户编号组,因此PARTITION BY可以根据它进行调整