批量加载Postgres具有独特的约束

时间:2014-04-11 09:08:54

标签: postgresql

我想要加载到Postgres中的大量数据(100 GB)。我一直在阅读文档,它建议删除索引和外键。

http://www.postgresql.org/docs/current/interactive/populate.html

我希望对表中的字段有一些独特的约束(即3列是唯一的)。如何加载?

我可以看到一些不同的选择:

A)通过Python或类似的东西正常加载它(慢 - 可能不值得做)。

B)获取唯一约束,加载数据,重新应用约束(在这种情况下,当存在重复时会发生什么?)

C)将数据加载到临时表中(没有唯一约束)。在SQL中做一些聪明的事情来删除重复项,并将结果复制到主表中。

1 个答案:

答案 0 :(得分:3)

您可以使用pg_bulkload加载它。 pg_buldload支持直接加载数据,不需要编写共享缓冲区,并支持并行。比未记录的表快得多。 你可以先创建唯一约束,然后使用pg_bulkload它,pg_bulkload可以将错误行记录到logfile中并纠正正确的行加载。你可以在加载后处理错误。 对于exp:

wget http://pgfoundry.org/frs/download.php/3566/pg_bulkload-3.1.5.tar.gz
[root@db-172-16-3-150 ~]# export PATH=/home/pg93/pgsql9.3.3/bin:$PATH
[root@db-172-16-3-150 ~]# cd /opt/soft_bak/pg_bulkload-3.1.5
[root@db-172-16-3-150 pg_bulkload-3.1.5]# which pg_config
/home/pg93/pgsql9.3.3/bin/pg_config
[root@db-172-16-3-150 pg_bulkload-3.1.5]# make
[root@db-172-16-3-150 pg_bulkload-3.1.5]# make install

pg93@db-172-16-3-150-> psql
psql (9.3.3)
Type "help" for help.
digoal=# truncate test;
TRUNCATE TABLE
digoal=# create extension pg_bulkload;

pg_bulkload -i /ssd3/pg93/test.dmp -O test -l /ssd3/pg93/test.log -o "TYPE=CSV" -o "WRITER=PARALLEL" -h $PGDATA -p $PGPORT -d $PGDATABASE

[root@db-172-16-3-150 pg93]# cat test.log
pg_bulkload 3.1.5 on 2014-03-28 13:32:31.32559+08

INPUT = /ssd3/pg93/test.dmp
PARSE_BADFILE = /ssd4/pg93/pg_root/pg_bulkload/20140328133231_digoal_public_test.prs.dmp
LOGFILE = /ssd3/pg93/test.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
NULL = 
OUTPUT = public.test
MULTI_PROCESS = YES
VERBOSE = NO
WRITER = DIRECT
DUPLICATE_BADFILE = /ssd4/pg93/pg_root/pg_bulkload/20140328133231_digoal_public_test.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO


  0 Rows skipped.
  50000000 Rows successfully loaded.
  0 Rows not loaded due to parse errors.
  0 Rows not loaded due to duplicate errors.
  0 Rows replaced with new rows.

Run began on 2014-03-28 13:32:31.32559+08
Run ended on 2014-03-28 13:35:13.019018+08

CPU 1.55s/128.55u sec elapsed 161.69 sec