Postgresql批处理INSERT / UPDATE与外部提供的ID

时间:2015-06-12 16:52:54

标签: sql postgresql

我有一个表格,其中包含我需要在cron上合并更新的实体(人员,公司等)。批量执行此操作非常重要,因为两个数据集都是任意大的(我一次发现一个需要数年)。两端都有一个我需要检查的ExternalIdentifer(EXID)。一直以来,任何INSERT都需要包含我的框架服务器提供的内部ID(整数)。如有必要,我可以保证这些ID是连续的整数。

逻辑如下:

  • 如果存在EXID,则只需对其进行直接更新。
  • 否则,
    • 如果存在EntityName,则执行UPDATE Else,使用ID执行INSERT 由框架服务器提供

简化表格列(请原谅我缺乏格式化知识)

column_name | DATA_TYPE

  1. "实体名称" | "字符变化"
  2. " externalidentifier" | "字符变化"
  3. 正在更新的表还具有框架提供的内部ID:

    1. " ENTITYID" | "整数"
    2. 所以问题归结为,是否有一种快速方法可以使用框架提供的ID批量执行此操作?

      编辑:这是通过从Java执行的SQL完成的。我更喜欢在SQL中这样做,但部分可以用Java完成。

1 个答案:

答案 0 :(得分:0)

我发现在许多陈述中并不难做到这一点。如果临时表在您的控制之下并且在此事务期间不会被修改,则可以通过这种方式完成此解决方案。

简短回答: 基本上使用两个表的交集进行更新。然后在移动之前移除它们。之后全部完成,只留下插入。我们从框架中获取一些连续的ID,从起始ID开始创建一个序列序列,并使用该序列为ID创建一个新的序列密钥列。

-- Do this once to create sequence
create sequence erm.temporary_new_id;

SQL:

-- Update any that have matching externalidentifier
update erm.entity set 
entityname = up.entityname

from (select * from erm.temp_entity_import where externalidentifier in

(select externalidentifier from erm.entity
intersect
select externalidentifier from erm.temp_entity_import)) as up

where erm.entity.externalidentifier = up.externalidentifier;

-- Now remove those from the temp table and return the # of affected rows
with del as (delete from erm.temp_entity_import where externalidentifier in

(select externalidentifier from erm.entity
intersect
select externalidentifier from erm.temp_entity_import) returning true)
select count(*) as rowcount from del;

-- Update any with matching entityname
update erm.entity set 
externalidentifier = up.externalidentifier

from (select * from erm.temp_entity_import where entityname in

(select entityname from erm.entity
intersect
select entityname from erm.temp_entity_import)) as up

where erm.entity.entityname = up.entityname;

-- Now remove those from the temp table and return the # of affected rows
with del as (delete from erm.temp_entity_import where entityname in

(select entityname from erm.entity
intersect
select entityname from erm.temp_entity_import) returning true)
select count(*) as rowcount from del;

-- Grab the number of rows left as the number to be inserted so we know how many IDs to claim
select count(temp_entity_id) from erm.temp_entity_import;

-- Set the serial keygen to start at the id chunk we allocated and add the row
alter sequence erm.temporary_new_id restart with STARTING_ID_NUM;
alter table erm.temp_entity_import add column entityid integer default  nextval('erm.temporary_new_id');

-- Insert them all
insert into erm.entity (entityid, entityname, externalidentifier)
select t.entityid, t.entityname, t.externalidentifier
from erm.temp_entity_import as t;

-- Delete from temp table (we can just wipe it)
delete from erm.temp_entity_import;

-- Remove the id column for next time
alter table erm.temp_entity_import drop column entityid;