我有三个表,UpEvent,DownEvent和AnalysisWindow
UpEvent:
up_event_id | event_date | EventMetric
1 2015-01-01T06:00:00 54
2 2015-01-01T07:30:00 76
DownEvent:
down_event_id | event_date | EventMetric
1 2015-01-01T06:46:00 22
2 2015-01-01T07:33:00 34
AnalysisWindow:
window_id | win_start | win_end
1 2015-01-01T00:00:00 2015-01-01T04:00:00
2 2015-01-01T00:00:00 2015-01-01T08:00:00
.
.
我想在每个AnalysisWindow上进行分析,以汇总在定义的窗口之间发生的UpEvent和DownEvent。
因此,对于每个AnalysisWindow记录,我都会得到1个特征行:
WinStart | WinEnd | TotalUpEvents | TotalDownEvents
2015-01-01T00:00:00 2015-01-01T04:00:00 0 0
2015-01-01T00:00:00 2015-01-01T08:00:00 2 2
我首先想到的是做类似的事情
select win.win_start,
win.win_end,
count(ue.*),
sum(ue.EventMetric)
from AnalysisWindow win
left join UpEvent ue on (ue.event_date between win.win_start and win.win_end)
显然不起作用。
我错误地解决了这个问题吗?我想对我配置的各个窗口中的表进行窗口分析,并为每个窗口获取1条汇总记录
答案 0 :(得分:2)
以下内容适用于BigQuery Standard SQL(并且确实有效!)
#standardSQL
WITH ue_win AS (
SELECT
window_id, COUNT(1) TotalUpEvents
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.UpEvent` ue
WHERE ue.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
), de_win AS (
SELECT
window_id, COUNT(1) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.DownEvent` de
WHERE de.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
)
SELECT
window_id, win_start, win_end,
IFNULL(TotalUpEvents, 0) TotalUpEvents,
IFNULL(TotalDownEvents, 0) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
LEFT JOIN ue_win USING(window_id)
LEFT JOIN de_win USING(window_id)
答案 1 :(得分:0)
一种方法使用相关子查询:
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end)
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end)
) as downs
from AnalysisWindow aw;
上面的方法至少可以用以下公式表示:
with UpEvent as (
select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
select 2, '2015-01-01T07:30:00', 76
),
DownEvent as (
select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
select 2, '2015-01-01T07:33:00', 34
),
AnalysisWindow as (
select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
)
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end
) as downs
from AnalysisWindow aw;
替代方法是使用union all
:
ud as (
select event_date, 1 as ups, 0 as downs from upevent
union all
select event_date, 0 as ups, 1 as downs from downevent
)
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
ud
ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)