将日期系列减少到BigQuery中的最小表示

时间:2018-10-16 20:15:48

标签: google-bigquery standard-sql

如果我有一个像这样的表:

start_date|end_date
1/1/2018|1/5/2018
1/4/2018|1/10/2018
1/9/2018|1/22/2018
2/1/2018|2/1/2018
1/31/2018|2/5/2018

我想获取这些行涵盖的所有日期范围。因此,我希望返回以下内容:

1/1/2018|1/22/2018
1/31/2018|2/5/2018

BigQuery中是否有可以处理此问题的函数?

1 个答案:

答案 0 :(得分:1)

没有此类功能-但您可以尝试以下操作(BigQuery标准SQL)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '1/1/2018' start_date, '1/5/2018' end_date UNION ALL
  SELECT '1/4/2018', '1/10/2018' UNION ALL
  SELECT '1/9/2018', '1/22/2018' UNION ALL
  SELECT '2/1/2018', '2/1/2018' UNION ALL
  SELECT '1/31/2018', '2/5/2018' 
), parsed_as_dates AS (
  SELECT PARSE_DATE('%m/%d/%Y', start_date) start_date, PARSE_DATE('%m/%d/%Y', end_date) end_date
  FROM `project.dataset.table`
), days AS (
  SELECT day FROM 
  (SELECT MIN(start_date) min_date, MAX(end_date) max_date FROM parsed_as_dates), 
  UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
), temp AS (
  SELECT day, SIGN(COUNTIF(day BETWEEN start_date AND end_date)) flag
  FROM days CROSS JOIN parsed_as_dates GROUP BY day
)
SELECT MIN(day) start_date, MAX(day) end_date
FROM (
  SELECT day, flag, SUM(start) OVER(ORDER BY day) grp
  FROM (
    SELECT day, flag, ABS(flag - IFNULL(LAG(flag) OVER(ORDER BY day), 0)) start
    FROM temp
  )
)
WHERE flag = 1
GROUP BY grp
-- ORDER BY start_date

结果低于

Row start_date  end_date     
1   2018-01-01  2018-01-22   
2   2018-01-31  2018-02-05    

只是“快速”的想法-您可能希望对其进行一些重构-因为对我来说,它看起来有些过分设计了:o),但至少它的工作可行