假设我的表结构是这个
我计划按(USER和SEQUENCE)对其进行分组,并获取下一个序列的LEAD时间戳。这是我要找的输出
如果可能的话,我可以使用LEAD功能在没有JOIN的情况下解决这个问题吗?
答案 0 :(得分:2)
以下是BigQuery Standard SQL
我将提出两个选项 - 使用JOIN(只是为了证明我理解/反向设计正确的预期逻辑)然后再加入JOIN-less版本(注意我使用ts
作为字段名称而不是{{ 1}})
使用JOIN
timestamp
无JOIN版本
#standardSQL
SELECT a.user, a.sequence, MIN(b.ts) ts
FROM (
SELECT user, sequence, MAX(ts) AS max_ts
FROM `project.dataset.table`
GROUP BY user, sequence
) a
LEFT JOIN `project.dataset.table` b
ON a.user = b.user AND b.sequence = a.sequence + 1
WHERE a.max_ts <= IFNULL(b.ts, a.max_ts)
GROUP BY user, sequence
-- ORDER BY user, sequence
以上版本都可以使用以下虚拟数据进行测试/播放
#standardSQL
SELECT
user, sequence,
(
SELECT ts FROM UNNEST(arr_ts) ts
WHERE max_ts < ts ORDER BY ts LIMIT 1
) ts
FROM (
SELECT
user, sequence, max_ts,
LEAD(arr_ts) OVER (PARTITION BY user ORDER BY sequence) arr_ts
FROM (
SELECT
user, sequence, MAX(ts) max_ts,
ARRAY_AGG(ts ORDER BY ts) arr_ts
FROM `project.dataset.table`
GROUP BY user, sequence
)
)
-- ORDER BY user, sequence
并且都返回结果
WITH `project.dataset.table` AS (
SELECT 'user1' user, 2 sequence, 'T1' ts UNION ALL
SELECT 'user1', 2, 'T2' UNION ALL
SELECT 'user1', 1, 'T3' UNION ALL
SELECT 'user1', 1, 'T4' UNION ALL
SELECT 'user1', 3, 'T5' UNION ALL
SELECT 'user1', 2, 'T6' UNION ALL
SELECT 'user1', 3, 'T7' UNION ALL
SELECT 'user1', 3, 'T8'
)
答案 1 :(得分:1)
不确定bigquery,但在一般SQL中它将被写为:
select user, sequence, LEAD (max_timestamp,1) OVER (PARTITION BY user ORDER BY sequence) as timestamp
from (
select user, sequence, max(timestamp) as max_timestamp
from table
group by user, sequence) q1;
请注意保留字,例如表,用户,时间戳等。
编辑:是的,忘了这个答案,对所需的输出不够专心。米哈伊尔做对了!