从给定日期的更改日志表到状态

时间:2019-04-10 21:23:35

标签: sql google-bigquery

我正在尝试使用BigQuery的标准SQL将变更日志表转换为历史状态表。

让我挂断电话的部分是如何选择加入日期之前的最新更改日志。

在我大学期间,我没有遇到过窗口函数或索引编制,因此,如果它们是理想解决方案的一部分,我将不胜感激如何使用这些函数。

Change_Logs表

   Update                   Key      Tostring
1  2019-01-30 17:57:51.910  PS-5864  To Do
2  2019-02-11 20:59:08.582  PS-5864  In Progress
3  2019-02-12 19:52:18.733  PS-5864  Done
4  2019-01-31 16:52:12.832  PS-4672  To Do
5  2019-02-11 14:11:13.442  PS-4672  In Progress
6  2019-02-12 04:22:33.111  PS-4672  Done

日期表

   Date
1  2019-02-10
2  2019-02-11
3  2019-02-12
4  2019-02-13

所需结果:

   Date                     Key      Status
1  2019-02-10 00:00:00.000  PS-5864  To Do
2  2019-02-10 00:00:00.000  PS-4672  To Do
3  2019-02-11 00:00:00.000  PS-5864  To Do
4  2019-02-11 00:00:00.000  PS-4672  To Do
5  2019-02-12 00:00:00.000  PS-5864  In Progress
6  2019-02-12 00:00:00.000  PS-4672  In Progress
7  2019-02-13 00:00:00.000  PS-5864  Done
8  2019-02-13 00:00:00.000  PS-4672  Done

2 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

dog

您可以使用问题中的示例数据来测试,玩游戏,如下例所示

#standardSQL
SELECT d.date, key, 
  ARRAY_AGG(status ORDER BY l.update DESC LIMIT 1)[OFFSET(0)] status
FROM `project.dataset.dates` d
JOIN `project.dataset.change_logs` l
ON DATE_DIFF(d.date, DATE(l.update), DAY) > 0
GROUP BY d.date, key

有结果

#standardSQL
WITH `project.dataset.change_logs` AS (
  SELECT DATETIME '2019-01-30 17:57:51.910' `update`, 'PS-5864' key, 'To Do' status UNION ALL
  SELECT '2019-02-11 20:59:08.582', 'PS-5864', 'In Progress' UNION ALL
  SELECT '2019-02-12 19:52:18.733', 'PS-5864', 'Done' UNION ALL
  SELECT '2019-01-31 16:52:12.832', 'PS-4672', 'To Do' UNION ALL
  SELECT '2019-02-11 14:11:13.442', 'PS-4672', 'In Progress' UNION ALL
  SELECT '2019-02-12 04:22:33.111', 'PS-4672', 'Done' 
), `project.dataset.dates` AS (
  SELECT DATE '2019-02-10' `date` UNION ALL
  SELECT '2019-02-11' UNION ALL
  SELECT '2019-02-12' UNION ALL
  SELECT '2019-02-13' 
)
SELECT d.date, key, 
  ARRAY_AGG(status ORDER BY l.update DESC LIMIT 1)[OFFSET(0)] status
FROM `project.dataset.dates` d
JOIN `project.dataset.change_logs` l
ON DATE_DIFF(d.date, DATE(l.update), DAY) > 0
GROUP BY d.date, key
-- ORDER BY d.date, key   

答案 1 :(得分:0)

关键思想是使用cross join生成行。然后,您真正想要的是lag(. . . ignore nulls),但BigQuery不支持。

相反,您可以进行一些数组操作:

select d.date, cl.key,
       array_agg(cl.status ignore nulls order by d.date desc limit 2)[ordinal(2)]
from dates d cross join
     (select distinct key from change_logs cl) k left join
     change_logs cl
     on date(cl.update) = d.date and cl.key = k.key;

编辑:

上述内容不太正确,因为我们缺少指定时间段之前的日期。我认为最简单的方法是先添加然后删除它们:

select *
from (select d.date, cl.key,
             array_agg(cl.status ignore nulls order by d.date desc limit 2)[ordinal(2)]
      from (select d.date
            from dates d 
            union 
            select distinct date(cl.update)
            from change_logs
           ) d cross join
           (select distinct key from change_logs cl) k left join
           change_logs cl
           on date(cl.update) = d.date and cl.key = k.key
    )
where date in (select d.date from dates);