我不确定如何在BigQuery中编写此SQL查询。我有一个包含名称和时间戳的事件表。让我们说我表中只有两个事件:A和B.我想要做的是查询表以获取事件A的所有实例,并获得下一个最接近的B事件并创建一个新列时差。 B将始终发生在A。
之后例如,如果我的表格如下:
A1 | 1:00 pm
B5 | 2:00 pm
A3 | 3:00 pm
B9 | 5:00 pm
我的结果表是:
A1 | 1 hour
A3 | 2 hours
我提出的查询如下:
SELECT
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
这适用于获取上面我想要的表,但我还想在子查询中包含一个额外的列。看起来像:
A1 | 1 hour | B5Column
A3 | 2 hours | B9Column
我尝试使用以下查询:
SELECT
(SELECT
sub.SubQueryColumn
FROM table sub
WHERE sub.time > main.time
ORDER BY sub.time asc
LIMIT 1) SubColumn,
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
但它不起作用。我得到的错误是
不支持引用其他表的相关子查询,除非它们可以解相关,例如将它们转换为有效的JOIN。
我可以帮忙解决这个问题吗?
答案 0 :(得分:0)
这是一种方法:
select m.*,
timestamp_diff(time, next_b_time, second) as duration
from (select m.*,
min(case when event like 'B%' then time end) over (order by time desc) as next_b_time
from main m
) m
where event like 'A%';
答案 1 :(得分:0)
以下是BigQuery Standard SQL
#standardSQL
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
-- ORDER BY time
您可以使用问题中的虚拟数据进行测试/播放,如下所示
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
结果为
Row event duration b_event
1 A1 3600 B5
2 A3 7200 B9
请注意:上述解决方案依赖于您问题中的陈述 - B will always happen after A
,如果您的序列如下
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
结果将是
Row event duration b_event
1 A1 null null
2 A2 1800 B5
3 A3 7200 B9
如果您需要解决此问题,请尝试以下
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time, type, grp,
FIRST_VALUE(event) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_event,
FIRST_VALUE(time) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_time
FROM (
SELECT event, time, SUBSTR(event, 1, 1) type,
COUNTIF(STARTS_WITH(event, 'B')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
此版本将返回
Row event duration b_event
1 A1 3600 B5
2 A2 1800 B5
3 A3 7200 B9