如何计算BigQuery中日期之间的差异

时间:2019-02-07 11:27:19

标签: google-bigquery

我有一个名为Employees的表,其列为:PersonID,Name,StartDate。我想计算1)最新员工和最老员工之间的天数差异,以及2)没有任何新员工的最长时间段(以天为单位)。我尝试使用DATEDIFF,但是日期在单列中,我不确定应该使用哪种其他方法。任何帮助将不胜感激

2 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

   
#standardSQL
SELECT 
  SUM(days_before_next_hire) AS days_between_newest_and_oldest_employee,
  MAX(days_before_next_hire) - 1 AS longest_period_without_new_hire
FROM (
  SELECT 
    DATE_DIFF(
      StartDate, 
      LAG(StartDate) OVER(ORDER BY StartDate), 
      DAY
    ) days_before_next_hire
  FROM `project.dataset.your_table`
)   

您可以像下面的示例一样使用虚拟数据来测试,玩游戏

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT DATE '2019-01-01' StartDate UNION ALL
  SELECT '2019-01-03' StartDate UNION ALL
  SELECT '2019-01-13' StartDate 
)
SELECT 
  SUM(days_before_next_hire) AS days_between_newest_and_oldest_employee,
  MAX(days_before_next_hire) - 1 AS longest_period_without_new_hire
FROM (
  SELECT 
    DATE_DIFF(
      StartDate, 
      LAG(StartDate) OVER(ORDER BY StartDate), 
      DAY
    ) days_before_next_hire
  FROM `project.dataset.your_table`
)   

有结果

Row days_between_newest_and_oldest_employee longest_period_without_new_hire  
1   12                                      9       

请注意在计算-1时使用longest_period_without_new_hire-实际取决于您是否要进行此调整,具体取决于您对计数差距的偏好

答案 1 :(得分:0)

1)最新记录和最早记录之间的天数差异

WITH table AS (
  SELECT DATE(created_at) date, *
  FROM `githubarchive.day.201901*` 
  WHERE _table_suffix<'2'
  AND repo.name = 'google/bazel-common'
  AND type='ForkEvent'
)

SELECT DATE_DIFF(MAX(date), MIN(date),  DAY) max_minus_min
FROM table

2)最长的时间段(以天为单位),没有任何新记录

WITH table AS (
  SELECT DATE(created_at) date, *
  FROM `githubarchive.day.201901*` 
  WHERE _table_suffix<'2'
  AND repo.name = 'google/bazel-common'
  AND type='ForkEvent'
)

SELECT MAX(diff) max_diff
FROM (
  SELECT DATE_DIFF(date, LAG(date) OVER(ORDER BY date), DAY) diff
  FROM table
)