如何将两个表合并在一起,选择具有较高值的​​列,而又不能使用MERGE语句?

时间:2018-10-28 10:35:09

标签: sql amazon-redshift

我有两个简单的表:

来源

id    count  date
6       30  10-28
7       80  10-29
5       20  10-28
4       10  10-27

目的地

id    count  date   
7       10  10-29
5       90  10-28
6       10  10-28

我想要的是将源的内容合并到一个目标中,在此目标中它们的ID和日期匹配,并且比较并选择count的最大值。如果目标中还没有该ID + date的行,则查询还应该能够将源中的一行插入目标。

运行查询后,目标应类似于:

id    count  date   
7       80  10-29
5       90  10-28
6       30  10-28
4       10  10-27

到目前为止,这是我想到的查询,但实际上无法更新目标表,并且无法使用MERGE。我也不确定它的效率:

select id, max(count), date from (
   select id, max(count) as count, date from source group by id, count, date
   union
   select id, max(count) as count, date from destination group by id, count, date
)
group by id, date;

我正在使用Amazon Redshift运行查询。

谢谢!

3 个答案:

答案 0 :(得分:1)

greatest可以与left join一起使用:

select s.id, greatest(s.count,d.count) as count, 
       s.date
  from source s
  left join destination d 
    on ( s.id = d.id and s.date = d.date );

P.S。如果greatest内的列表中的值(对于最小的情况为least)为NULL,则将其忽略。

如果只是不想选择而只想更改目标表而没有merge语句,则可以使用 CTAS create table as )语句,如下所示:以下代码块:

create table destination2 as 
select s.id, greatest(s.count,d.count) as count, s.date
  from source s
  left join destination d 
    on ( s.id = d.id and s.date = d.date );

delete from destination;

insert into destination
select * from destination2;

drop table destination2;

select * from destination;

答案 1 :(得分:1)

当然,MERGE只是(更有效的)替代了众所周知的两步过程:

-- first update existing id/date combinations
update dest
set count = src.count
from source
where dest.id    = src.id
  and dest.date  = src.date
  and dest.count < src.count;

-- then insert new id/date combinations
insert into dest
select id, count, date
from src
where not exists
 ( select * from dest
   where dest.id    = src.id
     and dest.date  = src.date
 );

答案 2 :(得分:0)

您可以使用union allfull join生成表:

select id, date, max(count) as count
from ((select id, date, count from source
      ) union all
      (select id, date, count from destination
      )
     ) t
group by id, date;

如果要将这些结果保存在表中,我倾向于建议创建一个新表并替换旧表:

create table new_destination as
    select id, date, max(count) as count
    from ((select id, date, count from source
          ) union all
          (select id, date, count from destination
          )
         ) t
    group by id, date;

truncate table destination;

insert into destination (id, date, count)
    select id, date, count
    from destination;