用组中的第一个非缺失值填充缺失值

时间:2019-12-09 05:04:39

标签: hiveql lead

我在Hive中有下表,需要用第一个非空值填充空值。我尝试了LEAD()函数,但只能修复group-key1。由于不清楚多少个缺失值之后会出现第一个非缺失值,因此无法精确应用LEAD。我可以想到使用key作为窗口的开窗功能,但是再次在group-Key中有子组会丢失和不丢失(例如group-key5)

注意-当var = d1

时出现第一个非缺失值
key date_time           var transid
===================================
1   2019-11-23 15:02:55 p1  null (populated with 82 using LEAD())
1   2019-11-23 15:04:06 d1  82
1   2019-11-23 15:04:29 e1  82
1   2019-11-23 15:05:32 ads 82
1   2019-11-23 15:05:35 ads 82
1   2019-11-23 15:05:55 tf  82

2   2019-11-23 13:23:31 p1  null (should be populated with 87)
2   2019-11-23 13:26:02 p1  null (should be populated with 87)
2   2019-11-23 13:29:54 d1  87
2   2019-11-23 13:32:06 e1  87
2   2019-11-23 13:33:21 ads 87
2   2019-11-23 13:33:24 ads 87
2   2019-11-23 13:33:40 ps  87

5   2019-11-24 18:42:13 p1  null (should be populated with 84)
5   2019-11-24 18:45:02 p1  null (should be populated with 84)
5   2019-11-24 18:45:32 p2  null (should be populated with 84)
5   2019-11-24 18:46:39 p2  null (should be populated with 84)
5   2019-11-24 18:47:34 d1  84
5   2019-11-24 18:47:58 d2  84
5   2019-11-24 18:48:56 p1  null (should be populated with 15)
5   2019-11-24 18:49:38 p1  null (should be populated with 15)
5   2019-11-24 18:50:33 d1  15
5   2019-11-24 18:50:53 ads 15

0 个答案:

没有答案