如何用上一行的值填充空列?

时间:2019-11-16 23:02:23

标签: sql google-bigquery data-warehouse

我正在尝试执行某些操作。我想让所有颜色填充值。但是,当我有空列时,我想用前一个非空列中的值填充它。

with cte as (
select '2019-11-12 16:01:55' as timestamp, null as owner_id, null as owner_assigneddate, null as lastmodifieddate union all
select '2019-11-12 19:03:18' as timestamp, 39530934 as owner_id, '2019-11-12 19:03:18' as owner_assigneddate, '2019-11-12 19:03:18' as lastmodifieddate union all
select '2019-11-12 19:03:19' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:19' as lastmodifieddate union all
select '2019-11-12 19:03:20' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:20' as lastmodifieddate union all
select '2019-11-12 19:03:31' as timestamp, 40320368 as owner_id, '2019-11-12 19:03:31' as owner_assigneddate, '2019-11-12 19:03:31' as lastmodifieddate union all
select '2019-11-12 19:03:33' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:33' as lastmodifieddate union all
select '2019-11-12 19:03:56' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:356' as lastmodifieddate)

select timestamp,
       owner_id,
       owner_assigneddate,
       lastmodifieddate,
       COALESCE(owner_id, LEAD(owner_id) OVER(ORDER BY timestamp DESC)) AS test_column
from cte order by timestamp asc 

在上一个查询中,我已经设法将值仅放在下一行。

我想要做的是让所有列都基于上一行填充值。 第4行的值应为39530934,第7行的值应为40320368。 我想我在这里错过了一些东西,但是我不知道是什么。

3 个答案:

答案 0 :(得分:1)

就顾虑而言,Big Query在窗口函数中不支持ignore null。这是一个依赖于窗口最大值的解决方案,用于查找保存最后一个非空owner_id的记录(假定时间戳的唯一性)。掌握了这些信息之后,您就可以通过联接引入相应的owner_id

select 
    c.timestamp,
    coalesce(c.owner_id, c_lag.owner_id) owner_id,
    c.owner_assigneddate,
    c.lastmodifieddate
from 
    (
        select
            cte.*,
            max(case when owner_id is not null then timestamp end) 
                over(order by timestamp rows unbounded preceding) target_timestamp
        from cte
    ) c
    left join cte c_lag 
        on c.owner_id is null 
        and c_lag.timestamp = c.target_timestamp

Demo on DB Fiddle

timestamp           | owner_id | owner_assigneddate  | lastmodifieddate    
:------------------ | -------: | :------------------ | :-------------------
2019-11-12 16:01:55 |     null | null                | null                
2019-11-12 19:03:18 | 39530934 | 2019-11-12 19:03:18 | 2019-11-12 19:03:18 
2019-11-12 19:03:19 | 39530934 | null                | 2019-11-12 19:03:19 
2019-11-12 19:03:20 | 39530934 | null                | 2019-11-12 19:03:20 
2019-11-12 19:03:31 | 40320368 | 2019-11-12 19:03:31 | 2019-11-12 19:03:31 
2019-11-12 19:03:33 | 40320368 | null                | 2019-11-12 19:03:33 
2019-11-12 19:03:56 | 40320368 | null                | 2019-11-12 19:03:356

注意:为了更好地理解逻辑(如果需要),您可以独立运行内部查询以查看其返回的内容(请参阅数据库提琴)。


修改

重新阅读此内容,我发现window max所提供的信息已经在您的原始数据中的owner_assigneddate列中了......这样简单得多:

select 
    c.timestamp,
    coalesce(c.owner_id, c_lag.owner_id) owner_id,
    c.owner_assigneddate,
    c.lastmodifieddate
from
    cte c
    left join cte c_lag 
        on c.owner_id is null 
        and c_lag.timestamp = c.owner_assigneddate

答案 1 :(得分:0)

这应符合您的cte定义:

...
select timestamp,
       owner_id,
       owner_assigneddate,
       lastmodifieddate,
       LAST_VALUE(owner_id IGNORE NULLS) 
            OVER(ORDER BY timestamp ASC ROWS BETWEEN 
                 UNBOUNDED PRECEDING AND CURRENT ROW) AS test_column
from cte order by timestamp asc 

答案 2 :(得分:0)

在BigQuery中,将LAST_VALUE()IGNORE NULLS选项 COALESCE()一起使用:

select timestamp,
       COALESCE(owner_id, last_value(owner_id ignore nulls) over (order by timestamp)) as owner_id,
       COALESCE(owner_assigneddate, LAST_VALUE(owner_assigneddate IGNORE NULLS) OVER (ORDER BY TIMESTAMP)) as owner_assigneddate,
       COALESCE(lastmodifieddate, LAST_VALUE(lastmodifieddate IGNORE NULLS) OVER (ORDER BY TIMESTAMP)) as lastmodifieddate
from cte order by timestamp asc