如果我有一个像这样的表:
start_date|end_date
1/1/2018|1/5/2018
1/4/2018|1/10/2018
1/9/2018|1/22/2018
2/1/2018|2/1/2018
1/31/2018|2/5/2018
我想获取这些行涵盖的所有日期范围。因此,我希望返回以下内容:
1/1/2018|1/22/2018
1/31/2018|2/5/2018
BigQuery中是否有可以处理此问题的函数?
答案 0 :(得分:1)
没有此类功能-但您可以尝试以下操作(BigQuery标准SQL)
#standardSQL
WITH `project.dataset.table` AS (
SELECT '1/1/2018' start_date, '1/5/2018' end_date UNION ALL
SELECT '1/4/2018', '1/10/2018' UNION ALL
SELECT '1/9/2018', '1/22/2018' UNION ALL
SELECT '2/1/2018', '2/1/2018' UNION ALL
SELECT '1/31/2018', '2/5/2018'
), parsed_as_dates AS (
SELECT PARSE_DATE('%m/%d/%Y', start_date) start_date, PARSE_DATE('%m/%d/%Y', end_date) end_date
FROM `project.dataset.table`
), days AS (
SELECT day FROM
(SELECT MIN(start_date) min_date, MAX(end_date) max_date FROM parsed_as_dates),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
), temp AS (
SELECT day, SIGN(COUNTIF(day BETWEEN start_date AND end_date)) flag
FROM days CROSS JOIN parsed_as_dates GROUP BY day
)
SELECT MIN(day) start_date, MAX(day) end_date
FROM (
SELECT day, flag, SUM(start) OVER(ORDER BY day) grp
FROM (
SELECT day, flag, ABS(flag - IFNULL(LAG(flag) OVER(ORDER BY day), 0)) start
FROM temp
)
)
WHERE flag = 1
GROUP BY grp
-- ORDER BY start_date
结果低于
Row start_date end_date
1 2018-01-01 2018-01-22
2 2018-01-31 2018-02-05
只是“快速”的想法-您可能希望对其进行一些重构-因为对我来说,它看起来有些过分设计了:o),但至少它的工作可行