我试图找到该行满足某些条件的列的上一个值。考虑表:
| user_id | session_id | time | referrer |
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |
我想为每个会话查找引用者为NULL的session_id的先前值。因此,对于第二和第三行,parent_session_id
的值应为1。
但是,仅使用lag(session_id) over (partition by user_id order by time)
,我将在第三行得到parent_session_id
= 2。
我怀疑可以结合使用窗口功能来完成此操作,但我只是想不通。
答案 0 :(得分:1)
您甚至可以通过相关子查询来做到这一点:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
以下是一种可能使用分析功能的方法:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
答案 1 :(得分:1)
我将last_value()与if()结合使用:
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t