我在Hadoop中有2个表,并希望根据特定条件将表B留给A
联接基于'ID'(a.ID = b.ID),但如果b.status_date> = a.date,我只想从表B引入2列'status_date'和'flag_y'>
表A:
+------------+-----+--------+
| date | ID | Flag_x |
+------------+-----+--------+
| 01/03/2019 | 100 | x |
| 01/03/2019 | 101 | x |
| 02/03/2019 | 102 | x |
| 02/03/2019 | 103 | x |
+------------+-----+--------+
表B:
+-------------+---------+--------+
| status_date | field_x | Flag_y |
+-------------+---------+--------+
| 15/03/2019 | 100 | y |
| 10/01/2019 | 102 | y |
+-------------+---------+--------+
所需的输出:
+------------+-----+--------+-------------+--------+
| date | ID | Flag_x | status_date | Flag_y |
+------------+-----+--------+-------------+--------+
| 01/03/2019 | 100 | x | 15/03/2019 | y |
| 01/03/2019 | 101 | x | | |
| 02/03/2019 | 102 | x | | |
| 02/03/2019 | 103 | x | | |
+------------+-----+--------+-------------+--------+
代码我尝试了下面的操作,该操作删除了ID 102行。在这种情况下,我想保留此行,但不要从表B中获取信息,因为'status_date'在此之前表A中的“日期”。我假设需要在where子句中添加一些内容?
Create Table Output As
Select
a.*
,b.status_date
,b.flag_y
From Table_A as a
Left join Table_B as b
On b.ID = a.ID
Where b.status_date is Null or b.status_date >= a.date
希望这很有意义,有人可以提供帮助