在特定事件之前查找最新事物

时间:2018-12-20 19:36:14

标签: sql google-bigquery

我正在解决一些时间戳记问题,但仍陷入一些联接逻辑。

我有一个像这样的数据表:

id, event_time, event_type, location
1001, 2018-06-04 18:23:48.526895 UTC, I, d
1001, 2018-06-04 19:26:44.359296 UTC, I, h
1001, 2018-06-05 06:07:03.658263 UTC, I, w
1001, 2018-06-07 00:47:44.651841 UTC, I, d
1001, 2018-06-07 00:48:17.857729 UTC, C, d
1001, 2018-06-08 00:04:53.086240 UTC, I, a
1001, 2018-06-12 21:23:03.071829 UTC, I, d
...

并且我正在尝试找出给定位置的用户具有{_ {1}}的event_type和C到最新event_type到最新的event_type I之间的时间戳差异值。

最终我所遵循的模式是:

C

我尝试了以下方法,该方法仅适用于一个id, location, timestamp_diff 1001, d, 33 1001, z, 21 1002, a, 55 ... 值,但似乎不适用于多个id。我可能使问题复杂化了,但不确定。在一个id上,它给出约5行,这是正确的。但是,当我将其打开两个id时,我应该得到200行以上的内容,例如7(第一个id为5行,第二个为2行):

id

2 个答案:

答案 0 :(得分:2)

#standardSQL
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
  SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
  FROM (
    SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
    FROM `project.dataset.table`
    WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
  )
  WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
) 
WHERE event_type = 'C' 
AND NOT i_event_time IS NULL

此版本解决了一些极端情况-例如,当连续的“ C”事件和“缺少”“ I”事件时,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1001 id, TIMESTAMP '2018-06-04 18:23:48.526895 UTC' event_time, 'I' event_type, 'd' location UNION ALL
  SELECT 1001, '2018-06-04 19:26:44.359296 UTC', 'I', 'h' UNION ALL
  SELECT 1001, '2018-06-05 06:07:03.658263 UTC', 'I', 'w' UNION ALL
  SELECT 1001, '2018-06-07 00:47:44.651841 UTC', 'I', 'd' UNION ALL
  SELECT 1001, '2018-06-07 00:48:17.857729 UTC', 'C', 'd' UNION ALL
  SELECT 1001, '2018-06-08 00:04:53.086240 UTC', 'C', 'd' UNION ALL
  SELECT 1001, '2018-06-12 21:23:03.071829 UTC', 'I', 'd' 
)
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
  SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
  FROM (
    SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
    FROM `project.dataset.table`
    WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
  )
  WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
) 
WHERE event_type = 'C' 
AND NOT i_event_time IS NULL    

结果是

Row id      location    diff     
1   1001    d           33     

而如果不解决上述提到的极端情况,则应该是

Row id      location    diff     
1   1001    d           33   
2   1001    d           83795    

答案 1 :(得分:1)

您可以使用累积max()函数来获取每个事件之前的最新i时间。

然后仅基于C事件进行过滤:

select id, location,
       timestamp_diff(event_time, i_event_time, second) as diff
from (select t.*,
             max(case when event_type = 'I' then event_time end) over (partition by id, location order by event_time) as i_event_time
      from t
     ) t
where event_type = 'C';