使用FIRST_VALUE用前面的非NULL值填充NULL值

时间:2019-03-18 23:32:09

标签: sql google-bigquery

我要加入两个表。

在第一个表中,我有一些项目是在特定时间开始的。在第二张表中,我具有每个项目的开始和结束时间之间每分钟的值和时间戳。

第一张桌子

UniqueID  Items start_time
123       one   10:00 AM
456       two   11:00 AM
789       three 11:30 AM

第二张桌子

UniqueID Items time_hit  value
123      one   10:00 AM    x
123      one   10:05 AM    x
123      one   10:10 AM    x
123      one   10:30 AM    x
456      two   11:00 AM    x
456      two   11:15 AM    x
789      three 11:30 AM    x

所以在连接两个表时,我有这个:

UniqueID Items start_time  time_hit   value 
123      one   10:00 AM    10:00 AM   x
123      null  null        10:05 AM   x
123      null  null        10:10 AM   x
123      null  null        10:30 AM   x
456      two   11:00 AM    11:00 AM   x
456      null  null        11:15 AM   x
789      three 11:30 AM    11:30 AM   x

我想用非空优先行中的值替换这些null值...

所以预期结果是

UniqueID Items start_time  time_hit   value 
123      one   10:00 AM    10:00 AM   x
123      one   10:00 AM    10:05 AM   x
123      one   10:00 AM    10:10 AM   x
123      one   10:00 AM    10:30 AM   x
456      two   11:00 AM    11:00 AM   x
456      two   11:00 AM    11:15 AM   x
789      three 11:30 AM    11:30 AM   x

我尝试使用以下功能建立联接,但没有成功:

  FIRST_VALUE(Items IGNORE NULLS) OVER (
    PARTITION BY time_hit ORDER BY time_hit
    ROWS BETWEEN CURRENT ROW AND
    UNBOUNDED FOLLOWING) AS test

我的问题有点离题。我发现UniqueID不一致,这就是为什么我在输出中具有这些空值的原因。因此,经过验证的答案是连接两个表时填充空值的好选择,并且其中一个表比另一个表具有更多的唯一行。

3 个答案:

答案 0 :(得分:1)

一种替代解决方案是使用NOT EXISTS子句作为JOIN条件,并使用相关子查询来确保我们与相关记录相关。

SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2 
    ON  t1.items = t2.items
    AND t1.start_time <= t2.time_hit  
    AND NOT EXISTS (
        SELECT 1 FROM table1 t10
        WHERE 
            t10.items = t2.items 
            AND t10.start_time <= t2.time_hit
            AND t10.start_time > t1.start_time
    )

Demo on DB Fiddle

| items | start_time | time_hit | value |
| ----- | ---------- | -------- | ----- |
| one   | 10:00:00   | 10:00:00 | x     |
| one   | 10:00:00   | 10:05:00 | x     |
| one   | 10:00:00   | 10:10:00 | x     |
| one   | 10:00:00   | 10:30:00 | x     |
| two   | 11:00:00   | 11:00:00 | x     |
| two   | 11:00:00   | 11:15:00 | x     |
| three | 11:30:00   | 11:30:00 | x     |

避免在EXISTS条件下使用JOIN的替代解决方案(Big Query中不允许):只需将条件移至WHERE子句即可。

SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2 
    ON  t1.items = t2.items
    AND t1.start_time <= t2.time_hit  
WHERE NOT EXISTS (
    SELECT 1 FROM table1 t10
    WHERE 
        t10.items = t2.items 
        AND t10.start_time <= t2.time_hit
        AND t10.start_time > t1.start_time
)

DB Fiddle

答案 1 :(得分:1)

您可以使用first_value(但是在这种情况下last_value也可以使用)。导入部分是指定rows between unbounded preceding and current row来设置窗口的边界。

已更新答案,以反映更新的问题以及对first_value的偏好

select
first_value(t1.UniqueId ignore nulls) over (partition by t2.UniqueId
                                           order by t2.time_hit
                                           rows between unbounded preceding and current row) as UniqueId,
first_value(t1.items ignore nulls) over (partition by t2.UniqueId
                                        order by t2.time_hit
                                        rows between unbounded preceding and current row) as Items,
first_value(t1.start_time ignore nulls) over (partition by t2.UniqueId
                                        order by t2.time_hit
                                        rows between unbounded preceding and current row) as start_time,
t2.time_hit,
t2.item_value
from table2 t2
left join table1 t1 on t1.start_time = t2.time_hit
order by t2.time_hit;

结果

| UNIQUEID | ITEMS | START_TIME | TIME_HIT | ITEM_VALUE |
|----------|-------|------------|----------|------------|
|      123 |   one |   10:00:00 | 10:00:00 |          x |
|      123 |   one |   10:00:00 | 10:05:00 |          x |
|      123 |   one |   10:00:00 | 10:10:00 |          x |
|      123 |   one |   10:00:00 | 10:30:00 |          x |
|      456 |   two |   11:00:00 | 11:00:00 |          x |
|      456 |   two |   11:00:00 | 11:15:00 |          x |
|      789 | three |   11:30:00 | 11:30:00 |          x |

SQL Fiddle Example

注意:我必须在SQL Fiddle中使用Oracle(因此必须更改数据类型和列名)。但这应该适用于您的数据库。

答案 2 :(得分:0)

我想您期望通过使用INNER JOIN获得输出。但不确定为什么要使用FIRST_VALUE。

SELECT I.Item, I.Start_Time, ID.Time_hit,  ID.Value
FROM Items I
INNER JOIN ItemDetails ID
 ON I.Items = ID.Items

请解释您是否正在寻找使用此方法的特定原因。