我要加入两个表。
在第一个表中,我有一些项目是在特定时间开始的。在第二张表中,我具有每个项目的开始和结束时间之间每分钟的值和时间戳。
第一张桌子
UniqueID Items start_time
123 one 10:00 AM
456 two 11:00 AM
789 three 11:30 AM
第二张桌子
UniqueID Items time_hit value
123 one 10:00 AM x
123 one 10:05 AM x
123 one 10:10 AM x
123 one 10:30 AM x
456 two 11:00 AM x
456 two 11:15 AM x
789 three 11:30 AM x
所以在连接两个表时,我有这个:
UniqueID Items start_time time_hit value
123 one 10:00 AM 10:00 AM x
123 null null 10:05 AM x
123 null null 10:10 AM x
123 null null 10:30 AM x
456 two 11:00 AM 11:00 AM x
456 null null 11:15 AM x
789 three 11:30 AM 11:30 AM x
我想用非空优先行中的值替换这些null
值...
所以预期结果是
UniqueID Items start_time time_hit value
123 one 10:00 AM 10:00 AM x
123 one 10:00 AM 10:05 AM x
123 one 10:00 AM 10:10 AM x
123 one 10:00 AM 10:30 AM x
456 two 11:00 AM 11:00 AM x
456 two 11:00 AM 11:15 AM x
789 three 11:30 AM 11:30 AM x
我尝试使用以下功能建立联接,但没有成功:
FIRST_VALUE(Items IGNORE NULLS) OVER (
PARTITION BY time_hit ORDER BY time_hit
ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING) AS test
我的问题有点离题。我发现UniqueID不一致,这就是为什么我在输出中具有这些空值的原因。因此,经过验证的答案是连接两个表时填充空值的好选择,并且其中一个表比另一个表具有更多的唯一行。
答案 0 :(得分:1)
一种替代解决方案是使用NOT EXISTS
子句作为JOIN
条件,并使用相关子查询来确保我们与相关记录相关。
SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2
ON t1.items = t2.items
AND t1.start_time <= t2.time_hit
AND NOT EXISTS (
SELECT 1 FROM table1 t10
WHERE
t10.items = t2.items
AND t10.start_time <= t2.time_hit
AND t10.start_time > t1.start_time
)
| items | start_time | time_hit | value |
| ----- | ---------- | -------- | ----- |
| one | 10:00:00 | 10:00:00 | x |
| one | 10:00:00 | 10:05:00 | x |
| one | 10:00:00 | 10:10:00 | x |
| one | 10:00:00 | 10:30:00 | x |
| two | 11:00:00 | 11:00:00 | x |
| two | 11:00:00 | 11:15:00 | x |
| three | 11:30:00 | 11:30:00 | x |
避免在EXISTS
条件下使用JOIN
的替代解决方案(Big Query中不允许):只需将条件移至WHERE
子句即可。
SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2
ON t1.items = t2.items
AND t1.start_time <= t2.time_hit
WHERE NOT EXISTS (
SELECT 1 FROM table1 t10
WHERE
t10.items = t2.items
AND t10.start_time <= t2.time_hit
AND t10.start_time > t1.start_time
)
答案 1 :(得分:1)
您可以使用first_value
(但是在这种情况下last_value
也可以使用)。导入部分是指定rows between unbounded preceding and current row
来设置窗口的边界。
已更新答案,以反映更新的问题以及对first_value
的偏好
select
first_value(t1.UniqueId ignore nulls) over (partition by t2.UniqueId
order by t2.time_hit
rows between unbounded preceding and current row) as UniqueId,
first_value(t1.items ignore nulls) over (partition by t2.UniqueId
order by t2.time_hit
rows between unbounded preceding and current row) as Items,
first_value(t1.start_time ignore nulls) over (partition by t2.UniqueId
order by t2.time_hit
rows between unbounded preceding and current row) as start_time,
t2.time_hit,
t2.item_value
from table2 t2
left join table1 t1 on t1.start_time = t2.time_hit
order by t2.time_hit;
结果
| UNIQUEID | ITEMS | START_TIME | TIME_HIT | ITEM_VALUE |
|----------|-------|------------|----------|------------|
| 123 | one | 10:00:00 | 10:00:00 | x |
| 123 | one | 10:00:00 | 10:05:00 | x |
| 123 | one | 10:00:00 | 10:10:00 | x |
| 123 | one | 10:00:00 | 10:30:00 | x |
| 456 | two | 11:00:00 | 11:00:00 | x |
| 456 | two | 11:00:00 | 11:15:00 | x |
| 789 | three | 11:30:00 | 11:30:00 | x |
注意:我必须在SQL Fiddle中使用Oracle(因此必须更改数据类型和列名)。但这应该适用于您的数据库。
答案 2 :(得分:0)
我想您期望通过使用INNER JOIN获得输出。但不确定为什么要使用FIRST_VALUE。
SELECT I.Item, I.Start_Time, ID.Time_hit, ID.Value
FROM Items I
INNER JOIN ItemDetails ID
ON I.Items = ID.Items
请解释您是否正在寻找使用此方法的特定原因。