如何根据计算对行进行分组?

时间:2017-06-28 22:14:07

标签: google-bigquery

我无法尝试对连续的行进行分组(按时间戳排序),时间戳之间的差异小于60秒。

这是一个示例表:

user    video       timestamp                  time_diff
----    --------    -----------------------    ---------
Bob     balldrop    2017-06-01 06:00:00 UTC       null
Bob     balldrop    2017-06-01 06:00:10 UTC       -10
Bob     balldrop    2017-06-01 06:00:30 UTC       -20
Bob     balldrop    2017-06-01 06:00:45 UTC       -15
Bob     balldrop    2017-06-01 06:04:00 UTC       -195
Bob     balldrop    2017-06-01 06:04:30 UTC       -30
Bob     bounce      2017-06-01 06:05:00 UTC       null
Bob     bounce      2017-06-01 06:05:20 UTC       -20

期望的结果:

user    video       timestamp                    group
----    --------    -----------------------    ---------
Bob     balldrop    2017-06-01 06:00:00 UTC        1
Bob     balldrop    2017-06-01 06:00:10 UTC        1
Bob     balldrop    2017-06-01 06:00:30 UTC        1
Bob     balldrop    2017-06-01 06:00:45 UTC        1
Bob     balldrop    2017-06-01 06:04:00 UTC        2
Bob     balldrop    2017-06-01 06:04:30 UTC        2
Bob     bounce      2017-06-01 06:05:00 UTC        3
Bob     bounce      2017-06-01 06:05:20 UTC        3

1 个答案:

答案 0 :(得分:2)

for BigQuery Standard SQL - 使用以下内容:

  
#standardSQL
WITH data AS (
  SELECT 'Bob' AS user, 'balldrop' AS video, TIMESTAMP '2017-06-01 06:00:00 UTC' AS ts UNION ALL
  SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:00:10 UTC' UNION ALL
  SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:00:30 UTC' UNION ALL
  SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:00:45 UTC' UNION ALL
  SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:04:00 UTC' UNION ALL
  SELECT 'Bob', 'balldrop', TIMESTAMP '2017-06-01 06:04:30 UTC' UNION ALL
  SELECT 'Bob', 'bounce', TIMESTAMP '2017-06-01 06:05:00 UTC' UNION ALL
  SELECT 'Bob', 'bounce', TIMESTAMP '2017-06-01 06:05:20 UTC' 
)
SELECT
  user, video, ts, 
  SUM(diff) OVER(PARTITION BY user ORDER BY ts) AS group_number
FROM (
  SELECT 
    user, video, ts, 
    IF(TIMESTAMP_DIFF(ts, LAG(ts) OVER(PARTITION BY user, video ORDER BY ts), SECOND) < 60, 0, 1) AS diff
  FROM data
)
-- ORDER BY ts

目前尚不清楚您希望如何为不同用户编号组,因此PARTITION BY可以根据它进行调整