我正在尝试使用BigQuery的标准SQL将变更日志表转换为历史状态表。
让我挂断电话的部分是如何选择加入日期之前的最新更改日志。
在我大学期间,我没有遇到过窗口函数或索引编制,因此,如果它们是理想解决方案的一部分,我将不胜感激如何使用这些函数。
Change_Logs表
Update Key Tostring
1 2019-01-30 17:57:51.910 PS-5864 To Do
2 2019-02-11 20:59:08.582 PS-5864 In Progress
3 2019-02-12 19:52:18.733 PS-5864 Done
4 2019-01-31 16:52:12.832 PS-4672 To Do
5 2019-02-11 14:11:13.442 PS-4672 In Progress
6 2019-02-12 04:22:33.111 PS-4672 Done
日期表
Date
1 2019-02-10
2 2019-02-11
3 2019-02-12
4 2019-02-13
所需结果:
Date Key Status
1 2019-02-10 00:00:00.000 PS-5864 To Do
2 2019-02-10 00:00:00.000 PS-4672 To Do
3 2019-02-11 00:00:00.000 PS-5864 To Do
4 2019-02-11 00:00:00.000 PS-4672 To Do
5 2019-02-12 00:00:00.000 PS-5864 In Progress
6 2019-02-12 00:00:00.000 PS-4672 In Progress
7 2019-02-13 00:00:00.000 PS-5864 Done
8 2019-02-13 00:00:00.000 PS-4672 Done
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
dog
您可以使用问题中的示例数据来测试,玩游戏,如下例所示
#standardSQL
SELECT d.date, key,
ARRAY_AGG(status ORDER BY l.update DESC LIMIT 1)[OFFSET(0)] status
FROM `project.dataset.dates` d
JOIN `project.dataset.change_logs` l
ON DATE_DIFF(d.date, DATE(l.update), DAY) > 0
GROUP BY d.date, key
有结果
#standardSQL
WITH `project.dataset.change_logs` AS (
SELECT DATETIME '2019-01-30 17:57:51.910' `update`, 'PS-5864' key, 'To Do' status UNION ALL
SELECT '2019-02-11 20:59:08.582', 'PS-5864', 'In Progress' UNION ALL
SELECT '2019-02-12 19:52:18.733', 'PS-5864', 'Done' UNION ALL
SELECT '2019-01-31 16:52:12.832', 'PS-4672', 'To Do' UNION ALL
SELECT '2019-02-11 14:11:13.442', 'PS-4672', 'In Progress' UNION ALL
SELECT '2019-02-12 04:22:33.111', 'PS-4672', 'Done'
), `project.dataset.dates` AS (
SELECT DATE '2019-02-10' `date` UNION ALL
SELECT '2019-02-11' UNION ALL
SELECT '2019-02-12' UNION ALL
SELECT '2019-02-13'
)
SELECT d.date, key,
ARRAY_AGG(status ORDER BY l.update DESC LIMIT 1)[OFFSET(0)] status
FROM `project.dataset.dates` d
JOIN `project.dataset.change_logs` l
ON DATE_DIFF(d.date, DATE(l.update), DAY) > 0
GROUP BY d.date, key
-- ORDER BY d.date, key
答案 1 :(得分:0)
关键思想是使用cross join
生成行。然后,您真正想要的是lag(. . . ignore nulls)
,但BigQuery不支持。
相反,您可以进行一些数组操作:
select d.date, cl.key,
array_agg(cl.status ignore nulls order by d.date desc limit 2)[ordinal(2)]
from dates d cross join
(select distinct key from change_logs cl) k left join
change_logs cl
on date(cl.update) = d.date and cl.key = k.key;
编辑:
上述内容不太正确,因为我们缺少指定时间段之前的日期。我认为最简单的方法是先添加然后删除它们:
select *
from (select d.date, cl.key,
array_agg(cl.status ignore nulls order by d.date desc limit 2)[ordinal(2)]
from (select d.date
from dates d
union
select distinct date(cl.update)
from change_logs
) d cross join
(select distinct key from change_logs cl) k left join
change_logs cl
on date(cl.update) = d.date and cl.key = k.key
)
where date in (select d.date from dates);