假设我有一个包含以下架构的表
name | type
----------------------
id | STRING
timestamp | TIMESTAMP
event_type | STRING
some_value | STRING
...
我希望获得'x'
类型的所有事件。但是,我还想为返回的每一行添加一个附加参数。如果最近的事件TRUE
具有WHERE event_type='y'
,则此参数为布尔值some_value='necessary value'
。
例如,假设以下行按时间戳递增排序:
event_type | some_value
------------------------
y | 'true value'
x | 'not relevant'
y | 'false value'
x | 'not relevant 2'
y | 'true value'
y | 'false value'
x | 'not relevant3'
x | 'not relevant4'
我会从查询中获得以下行:
event_type | some_value | previous_true
-------------------------------------
x | 'not relevant' | TRUE
x | 'not relevant2' | FALSE
x | 'not relevant3' | FALSE
x | 'not relevant4' | FALSE
我认为加入可能会成功,但我无法弄清楚它是如何起作用的。 LAG
一开始似乎也是一个好主意,但后来我意识到LAG将采用前一行而不管它是什么,我不确定如何使用它。
答案 0 :(得分:2)
使用BigQuery Standard SQL - 请尝试以下
(确保取消选中显示选项 下的Use Legacy SQL
复选框)
WITH YourTable AS (
SELECT 1 AS ts, 'y' AS event_type, 'true value' AS some_value UNION ALL
SELECT 2 AS ts, 'x' AS event_type, 'not relevant' AS some_value UNION ALL
SELECT 3 AS ts, 'y' AS event_type, 'false value' AS some_value UNION ALL
SELECT 4 AS ts, 'x' AS event_type, 'not relevant2' AS some_value UNION ALL
SELECT 5 AS ts, 'y' AS event_type, 'true value' AS some_value UNION ALL
SELECT 6 AS ts, 'y' AS event_type, 'false value' AS some_value UNION ALL
SELECT 7 AS ts, 'x' AS event_type, 'not relevant3' AS some_value UNION ALL
SELECT 8 AS ts, 'x' AS event_type, 'not relevant4' AS some_value
)
SELECT
event_type,
some_value,
(SELECT some_value = 'true value' FROM YourTable
WHERE event_type = 'y' AND ts < a.ts
ORDER BY ts DESC LIMIT 1
) AS previous_true
FROM YourTable AS a
WHERE event_type = 'x'
ORDER BY ts
结果是:
event_type some_value previous_true
x not relevant true
x not relevant2 false
x not relevant3 false
x not relevant4 false
对于BigQuery Legacy SQL - 尝试
SELECT
event_type, some_value,
previous_true = 'true value' AS previous_true
FROM (
SELECT
ts, event_type, some_value,
FIRST_VALUE(some_value) OVER(PARTITION BY grp ORDER BY ts) AS previous_true
FROM (
SELECT
ts, event_type, some_value,
SUM(step) OVER(ORDER BY ts) AS grp
FROM (
SELECT
ts, event_type, some_value,
IF(event_type = 'x' , 0, 1) AS step
FROM
(SELECT 1 AS ts, 'y' AS event_type, 'true value' AS some_value),
(SELECT 2 AS ts, 'x' AS event_type, 'not relevant' AS some_value),
(SELECT 3 AS ts, 'y' AS event_type, 'false value' AS some_value),
(SELECT 4 AS ts, 'x' AS event_type, 'not relevant2' AS some_value),
(SELECT 5 AS ts, 'y' AS event_type, 'true value' AS some_value),
(SELECT 6 AS ts, 'y' AS event_type, 'false value' AS some_value),
(SELECT 7 AS ts, 'x' AS event_type, 'not relevant3' AS some_value),
(SELECT 8 AS ts, 'x' AS event_type, 'not relevant4' AS some_value)
)
)
)
WHERE event_type = 'x'
ORDER BY ts
答案 1 :(得分:0)
以下是一种方法:您可以使用&#34; y&#34;得到每个&#34; x&#34;的最近y的id。然后使用join
进行计算:
select t.*,
(case when some_value = 'necessary value' then 1 else 0 end) as previous_true
from (select t.*,
max(case when event_type = 'y' then id end) over (order by timestamp) as yid
from t
) t join
t ty
on ty.id = t.yid
where t.event_type = 'x';
我不确定id
和timestamp
的确切角色。此版本假定id
相对于timestamp
统一增加。或者,您可以使用timestamp
- 但不清楚这对join
是否足够。