我想在BigQuery中合并相邻的日期范围。
我有一张这样的桌子:
ID START END
1 2019-01-18 17:34:58 UTC 2019-02-18 12:14:59 UTC
1 2019-02-18 06:04:39 UTC 2019-02-18 08:05:05 UTC
1 2019-02-18 08:05:05 UTC 2019-02-18 10:06:05 UTC
1 2019-02-18 10:06:05 UTC 2019-02-19 11:16:15 UTC
2 2019-01-19 06:02:29 UTC 2019-01-29 11:02:23 UTC
由于中间三行表示一个分为三部分的单个范围,因此我想将它们组合起来,因此表格如下所示:
ID START END
1 2019-01-18 17:34:58 UTC 2019-02-18 12:14:59 UTC
1 2019-02-18 06:04:39 UTC 2019-02-19 11:16:15 UTC
2 2019-01-19 06:02:29 UTC 2019-01-29 11:02:23 UTC
完成此任务的最佳方法是什么?
答案 0 :(得分:1)
您需要确定范围的起始位置。就您而言,它们似乎具有完全匹配的结束和开始位置,因此您可以使用lag()
来确定组从何处开始。起始次数的累积计数提供了一个分组ID,可用于聚合:
select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
from (select t.*, lag(end) over (partition by id order by start) as prev_end
from t
) t
) t
group by id, grp;
如果组可以重叠,则通常可以通过累积最大值来达到目的:
select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
from (select t.*,
max(end) over (partition by id order by start rows between unbounded preceding and 1 preceding) as prev_end
from t
) t
) t
group by id, grp;