Question

Postgres 9.6是否可以使用on duplicate key命令获取COPY UPSERT功能？我有一个CSV文件，我将其导入Postgres，但它可能包含一些重复的密钥违规，因此COPY命令会出错并在遇到它时终止。

文件非常大，因此可能无法在应用程序代码中预处理它（为了处理可能导致重复键冲突的行），因为所有键可能都不适合内存。

将大量行导入Postgres可能包含重复密钥违规的最佳方法是什么？

Answer 1

样品：

t=# create table s90(i int primary key, t text);
CREATE TABLE
t=# insert into s90 select 1,'a';
INSERT 0 1
t=# copy s90 from stdin delimiter ',';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1,'b'
>> 2,'c'
>> \.
ERROR:  duplicate key value violates unique constraint "s90_pkey"
DETAIL:  Key (i)=(1) already exists.
CONTEXT:  COPY s90, line 1

复制的解决方法：

t=# create table s91 as select * from s90 where false;;
SELECT 0
t=# copy s91 from stdin delimiter ',';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1,'b'
>> 2,'c'
>> \.
COPY 2
t=# with p as (select s91.* from s91 left outer join s90 on s90.i=s91.i where s90.i is null)
insert into s90 select * from p;
INSERT 0 1
t=# select * from s90;
 i |  t
---+-----
 1 | a
 2 | 'c'
(2 rows)

Answer 2

使用扩展名file_fdw，您可以打开文件并像表一样查询。

Read more in the documentation.

示例：

create extension if not exists file_fdw;

create server csv_server foreign data wrapper file_fdw;

create foreign table my_csv_file (
    id integer,
    should_be_unique_id integer,
    some_other_columns text
) server csv_server
options (filename '/data/my_large_file.csv', format 'csv');

insert into my_new_table
select distinct on (should_be_unique_id) *
from my_csv_file
order by should_be_unique_id, id desc;

或者，如果my_new_table不为空，则可以使用

insert into my_new_table
select * 
from my_csv_file
on conflict ... update ...

将大量行导入Postgres，并发生重复的密钥冲突

2 个答案: