在Google BigQuery中,我尝试将事件开始时间与结束时间相关联,结束时间定义为事件类型与开始时间的事件类型不匹配的最长时间。
以下是举例说明我的问题:
原始数据集:
Name Event Event Type Datetime
**** ****** ********** ****************
Bob Tennis Start 2017-02-17 8:00
Bob Tennis Playing 2017-02-17 8:10
Bob Tennis Playing 2017-02-17 8:20
Bob Tennis Playing 2017-02-17 8:30
Bob Tennis Playing 2017-02-17 8:50
Bob Tennis Start 2017-02-17 10:00
Bob Tennis Playing 2017-02-17 10:30
Bob Bowling Start 2017-02-18 2:15
Bob Bowling Playing 2017-02-18 2:18
所需的表格:
Name Event Start Datetime End Datetime
**** ****** **************** ****************
Bob Tennis 2017-02-17 8:00 2017-02-17 8:50
Bob Tennis 2017-02-17 10:00 2017-02-17 10:30
Bob Bowling 2017-02-18 2:15 2017-02-18 2:18
我知道解决方案必须涉及partition
和max
函数,但我不确定如何找到事件类型与行的类型不匹配的最大日期时间问题。
答案 0 :(得分:3)
尝试以下,应该给你一个想法
#standardSQL
SELECT Name, Event, MIN(DateTime) AS StartDateTime, MAX(DateTime) AS EndDateTime
FROM (
SELECT Name, Event, EventType, DateTime,
COUNTIF(EventType = 'Start') OVER(PARTITION BY Name, Event ORDER BY DateTime ) AS grp
FROM yourTable
)
GROUP BY Name, Event, grp
您可以使用以下虚拟数据进行测试
WITH yourTable AS (
SELECT 'Bob' AS Name, 'Tennis' AS Event, 'Start' AS EventType, '2017-02-17 08:00' AS DateTime UNION ALL
SELECT 'Bob', 'Tennis', 'Playing', '2017-02-17 08:10' UNION ALL
SELECT 'Bob', 'Tennis', 'Playing', '2017-02-17 08:20' UNION ALL
SELECT 'Bob', 'Tennis', 'Playing', '2017-02-17 08:30' UNION ALL
SELECT 'Bob', 'Tennis', 'Playing', '2017-02-17 08:50' UNION ALL
SELECT 'Bob', 'Tennis', 'Start', '2017-02-17 10:00' UNION ALL
SELECT 'Bob', 'Tennis', 'Playing', '2017-02-17 10:30' UNION ALL
SELECT 'Bob', 'Bowling', 'Start', '2017-02-18 02:15' UNION ALL
SELECT 'Bob', 'Bowling', 'Playing', '2017-02-18 02:18'
)