我正在解决一些时间戳记问题,但仍陷入一些联接逻辑。
我有一个像这样的数据表:
id, event_time, event_type, location
1001, 2018-06-04 18:23:48.526895 UTC, I, d
1001, 2018-06-04 19:26:44.359296 UTC, I, h
1001, 2018-06-05 06:07:03.658263 UTC, I, w
1001, 2018-06-07 00:47:44.651841 UTC, I, d
1001, 2018-06-07 00:48:17.857729 UTC, C, d
1001, 2018-06-08 00:04:53.086240 UTC, I, a
1001, 2018-06-12 21:23:03.071829 UTC, I, d
...
并且我正在尝试找出给定位置的用户具有{_ {1}}的event_type和C
到最新event_type到最新的event_type I
之间的时间戳差异值。
最终我所遵循的模式是:
C
我尝试了以下方法,该方法仅适用于一个id, location, timestamp_diff
1001, d, 33
1001, z, 21
1002, a, 55
...
值,但似乎不适用于多个id
。我可能使问题复杂化了,但不确定。在一个id
上,它给出约5行,这是正确的。但是,当我将其打开两个id
时,我应该得到200行以上的内容,例如7(第一个id
为5行,第二个为2行):
id
答案 0 :(得分:2)
#standardSQL
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
此版本解决了一些极端情况-例如,当连续的“ C”事件和“缺少”“ I”事件时,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1001 id, TIMESTAMP '2018-06-04 18:23:48.526895 UTC' event_time, 'I' event_type, 'd' location UNION ALL
SELECT 1001, '2018-06-04 19:26:44.359296 UTC', 'I', 'h' UNION ALL
SELECT 1001, '2018-06-05 06:07:03.658263 UTC', 'I', 'w' UNION ALL
SELECT 1001, '2018-06-07 00:47:44.651841 UTC', 'I', 'd' UNION ALL
SELECT 1001, '2018-06-07 00:48:17.857729 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-08 00:04:53.086240 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-12 21:23:03.071829 UTC', 'I', 'd'
)
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
结果是
Row id location diff
1 1001 d 33
而如果不解决上述提到的极端情况,则应该是
Row id location diff
1 1001 d 33
2 1001 d 83795
答案 1 :(得分:1)
您可以使用累积max()
函数来获取每个事件之前的最新i
时间。
然后仅基于C
事件进行过滤:
select id, location,
timestamp_diff(event_time, i_event_time, second) as diff
from (select t.*,
max(case when event_type = 'I' then event_time end) over (partition by id, location order by event_time) as i_event_time
from t
) t
where event_type = 'C';