使用条件滞后语句进行查询

时间:2018-11-22 07:18:42

标签: google-bigquery standard-sql

我试图找到该行满足某些条件的列的上一个值。考虑表:

| user_id | session_id | time       | referrer   |  
|---------|------------|------------|------------|  
| 1       | 1          | 2018-01-01 | [NULL]     |  
| 1       | 2          | 2018-02-01 | google.com |  
| 1       | 3          | 2018-03-01 | google.com |

我想为每个会话查找引用者为NULL的session_id的先前值。因此,对于第二和第三行,parent_session_id的值应为1。

但是,仅使用lag(session_id) over (partition by user_id order by time),我将在第三行得到parent_session_id = 2。

我怀疑可以结合使用窗口功能来完成此操作,但我只是想不通。

2 个答案:

答案 0 :(得分:1)

您甚至可以通过相关子查询来做到这一点:

SELECT
    session_id,
    (SELECT MAX(t2.session_id) FROM yourTable t2
     WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
    session_id;

以下是一种可能使用分析功能的方法:

WITH cte AS (
    SELECT *,
        SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
            OVER (ORDER BY session_id) cnt
    FROM yourTable
)

SELECT
    session_id,
    CASE WHEN cnt = 0
         THEN NULL
         ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
    session_id;

答案 1 :(得分:1)

我将last_value()与if()结合使用:

WITH t AS (SELECT * FROM UNNEST([ 
    struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
    (1,2,date('2018-02-01'), 'google.com'),
    (1,3,date('2018-03-01'), 'google.com')
  ]) )

SELECT
  *,
  last_value(IF(referrer is null, session_id, NULL) ignore nulls) 
    over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t