假设我有一个BigQuery表“事件”(实际上这是一个缓慢的子查询),该表按事件类型存储每天的事件计数。事件类型很多,大多数情况下大多数情况下不会发生,因此,日期/事件类型组合只有一行,且计数非零。
我有一个查询,返回每个事件类型和日期的计数以及N天前该事件的计数,如下所示:
WITH events AS (
SELECT DATE('2019-06-08') AS day, 'a' AS type, 1 AS count
UNION ALL SELECT '2019-06-09', 'a', 2
UNION ALL SELECT '2019-06-10', 'a', 3
UNION ALL SELECT '2019-06-07', 'b', 4
UNION ALL SELECT '2019-06-09', 'b', 5
)
SELECT e1.type, e1.day, e1.count, COALESCE(e2.count, 0) AS prev_count
FROM events e1
LEFT JOIN events e2 ON e1.type = e2.type AND e1.day = DATE_ADD(e2.day, INTERVAL 2 DAY) -- LEFT JOIN, because the event may not have occurred at all 2 days ago
ORDER BY 1, 2
查询速度很慢。 BigQuery best practices建议使用窗口函数而不是自联接。这里有办法吗?如果每天都有一行,那么我可以使用LAG
函数,但是没有。我可以以某种方式“填充”它吗? (没有可能的事件类型的简短列表。我当然可以加入SELECT DISTINCT type FROM events
,但这可能不会比自动加入更快。)
答案 0 :(得分:2)
蛮力方法是:
select e.*,
(case when lag(day) over (partition by type order by date) = dateadd(e.day, interval -2 day)
then lag(cnt) over (partition by type order by date)
when lag(day, 2) over (partition by type order by date) = dateadd(e.day, interval -2 day)
then lag(cnt, 2) over (partition by type order by date)
end) as prev_day2_count
from events e;
这可以正常工作两天。对于更长的延迟,它变得更加繁琐。
编辑:
更一般的形式使用窗框。不幸的是,这些必须是数字,所以还需要执行其他步骤:
select e.*,
(case when min(day) over (partition by type order by diff range between 2 preceding and current day) = date_add(day, interval -2 day)
then first_value(cnt) over (partition by type order by diff range between 2 preceding and current day)
end)
from (select e.*,
date_diff(day, max(day) over (partition by type), day) as diff -- day is a bad name for a column because it is a date part
from events e
) e;
然后啊! case
表达式不是必需的:
select e.*,
first_value(cnt) over (partition by type order by diff range between 2 preceding and 2 preceding)
from (select e.*,
date_diff(day, max(day) over (partition by type), day) as diff -- day is a bad name for a column because it is a date part
from events e
) e;
答案 1 :(得分:1)
以下是用于BigQuery标准SQL
ProxyResolver
如果t适用于您问题中的样本数据-结果为:
iostat -xty 5 |
awk '/^[0-9]{2}\/[0-9]{2}\/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$/ {t=$0}
/^sdb/ { print t "," $7}'