在BigQuery中使用LEAD

时间:2017-12-13 14:30:51

标签: sql google-bigquery lead

假设我的表结构是这个

enter image description here

我计划按(USER和SEQUENCE)对其进行分组,并获取下一个序列的LEAD时间戳。这是我要找的输出

enter image description here

如果可能的话,我可以使用LEAD功能在没有JOIN的情况下解决这个问题吗?

2 个答案:

答案 0 :(得分:2)

以下是BigQuery Standard SQL

   

我将提出两个选项 - 使用JOIN(只是为了证明我理解/反向设计正确的预期逻辑)然后再加入JOIN-less版本(注意我使用ts作为字段名称而不是{{ 1}})

  

使用JOIN

timestamp
  

无JOIN版本

#standardSQL
SELECT a.user, a.sequence, MIN(b.ts) ts 
FROM (
  SELECT user, sequence, MAX(ts) AS max_ts
  FROM `project.dataset.table`
  GROUP BY user, sequence
) a
LEFT JOIN `project.dataset.table` b
ON a.user = b.user AND b.sequence = a.sequence + 1
WHERE a.max_ts <= IFNULL(b.ts, a.max_ts)
GROUP BY user, sequence
-- ORDER BY user, sequence

以上版本都可以使用以下虚拟数据进行测试/播放

#standardSQL
SELECT
  user, sequence, 
  (
    SELECT ts FROM UNNEST(arr_ts) ts 
    WHERE max_ts < ts ORDER BY ts LIMIT 1
  ) ts
FROM (
  SELECT
    user, sequence, max_ts,
    LEAD(arr_ts) OVER (PARTITION BY user ORDER BY sequence) arr_ts
  FROM (
  SELECT 
      user, sequence, MAX(ts) max_ts, 
      ARRAY_AGG(ts ORDER BY ts) arr_ts
    FROM `project.dataset.table`
    GROUP BY user, sequence
  )
)
-- ORDER BY user, sequence   

并且都返回结果

WITH `project.dataset.table` AS (
  SELECT 'user1' user, 2 sequence, 'T1' ts UNION ALL
  SELECT 'user1', 2, 'T2' UNION ALL
  SELECT 'user1', 1, 'T3' UNION ALL
  SELECT 'user1', 1, 'T4' UNION ALL
  SELECT 'user1', 3, 'T5' UNION ALL
  SELECT 'user1', 2, 'T6' UNION ALL
  SELECT 'user1', 3, 'T7' UNION ALL
  SELECT 'user1', 3, 'T8' 
)   

答案 1 :(得分:1)

不确定bigquery,但在一般SQL中它将被写为:

select user, sequence, LEAD (max_timestamp,1) OVER (PARTITION BY user ORDER BY sequence) as timestamp
from (
    select user, sequence, max(timestamp) as max_timestamp
    from table
    group by user, sequence) q1;

请注意保留字,例如表,用户,时间戳等。

编辑:是的,忘了这个答案,对所需的输出不够专心。米哈伊尔做对了!