配置单元-如果在另一个表中找不到记录,则用今天的日期更新表中的记录?

时间:2019-03-13 18:44:28

标签: hive jupyter-notebook hiveql

我目前有一个主要结果表(test1),其中存储了我所有的问题记录,还有另一个表(test2),该表每周大约运行一次,我试图找到那些每周都不存在的记录更新并更新主结果表中的日期,因为它是要在系统中进行更正的日期。

我试图将test2表中的记录添加到test1表中,如果它们尚未在表中。

这有效:

insert into table test1 (id, name, code)
select * from test2 t2 where t2.id not in (select id from test1);

我还尝试更新表test1 'Corrected_date'列,以显示在test1中找到但在test2中找不到的所有记录的当前日期

以下示例数据:

表1

ID    NAME    CODE    CORRECTED_DATE
1     TEST    3    
29    TEST2   90 

表2

ID    NAME    CODE  
12    TEST5   20
1     TEST    3

表1的预期最终结果

ID    NAME    CODE    CORRECTED_DATE
1     TEST    3       
29    TEST2   90       3/13/2019
12    TEST5   20

1 个答案:

答案 0 :(得分:0)

使用FULL JOIN覆盖表。 FULL JOIN返回已合并的记录+未从左表中合并+未从右表中合并。您可以使用case语句来实现您的逻辑,如下所示:

insert OVERWRITE table test1

select 
      --select t1 if both or t1 only exist, t2 if only t2 exists
      case when t1.ID is null then t2.ID   else t1.ID   end as ID,
      case when t1.ID is null then t2.NAME else t1.NAME end as NAME,
      case when t1.ID is null then t2.CODE else t1.CODE end as CODE,

      --if found in t1 but not in t2 then current_date else leave as is
      case when (t1.ID is not null) and (t2.ID is null) then current_date else t1.CORRECTED_DATE end as CORRECTED_DATE 
  from test1 t1 
       FULL OUTER JOIN test2 t2 on t1.ID=t2.ID;

另请参阅有关增量更新的类似问题,您的逻辑不同,但方法相同:https://stackoverflow.com/a/37744071/2700344

测试数据:

with test1 as (
select stack (2,
1, 'TEST',    3,null,    
29,'TEST2',   90 , null
             ) as (ID,NAME,CODE,CORRECTED_DATE)
),

     test2 as (
select stack (2,
              12,'TEST5',20,
              1,'TEST',3
             ) as (ID, NAME, CODE)
)

select 
      --select t1 if both or t1 only exist, t2 if only t2 exists
      case when t1.ID is null then t2.ID   else t1.ID   end as ID,
      case when t1.ID is null then t2.NAME else t1.NAME end as NAME,
      case when t1.ID is null then t2.CODE else t1.CODE end as CODE,

      --if found in test1 but not in test2 then current_date else leave as is
      case when (t1.ID is not null) and (t2.ID is null) then current_date else t1.CORRECTED_DATE end as CORRECTED_DATE 
  from test1 t1 
       FULL OUTER JOIN test2 t2 on t1.ID=t2.ID;

结果:

OK
id      name    code    corrected_date
1       TEST    3       NULL
12      TEST5   20      NULL
29      TEST2   90      2019-03-14
Time taken: 41.727 seconds, Fetched: 3 row(s)

结果符合预期。