将平面文件加载到postgres表。我需要在读取文件时进行一些转换并加载它。
Like
-->Check for characters, if it is present, default some value. Reg_Exp can be used in oracle. How the functions can be called in below syntax?
-->TO_DATE function from text format
-->Check for Null and defaulting some value
-->Trim functions
-->Only few columns from source file should be loaded
-->Defaulting values, say for instance, source file has only 3 columns. But we need to load 4 columns. One column should be defaulted with some value
LOAD CSV
FROM 'filename'
INTO postgresql://role@host:port/database_name?tablename
TARGET COLUMNS
(
alphanm,alphnumnn,nmrc,dte
)
WITH truncate,
skip header = 0,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|',
batch rows = 100,
batch size = 1MB,
batch concurrency = 64
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
请帮助我,如何使用pgloader完成这项工作?
由于
答案 0 :(得分:3)
这是pgloader的一个独立测试用例,可以重现您的用例,我能够理解它:
/*
Sorry pgloader version "3.3.2" compiled with SBCL 1.2.8-1.el7 Doing kind
of POC, to implement in real time work. Sample data from file:
raj|B|0.5|20170101|ABCD Need to load only first,second,third and fourth
column; Table has three column, third column should be defaulted with some
value. Table structure: A B C-numeric D-date E-(Need to add default value)
*/
LOAD CSV
FROM inline
(
alphanm,
alphnumnn,
nmrc,
dte [date format 'YYYYMMDD'],
other
)
INTO postgresql:///pgloader?so.raja
(
alphanm,
alphnumnn,
nmrc,
dte,
col text using "constant value"
)
WITH truncate,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|'
SET work_mem to '12MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists so.raja; $$,
$$ create table so.raja (
alphanm text,
alphnumnn text,
nmrc numeric,
dte date,
col text
);
$$;
raj|B|0.5|20170101|ABCD
现在,这是运行pgloader命令的摘录:
$ pgloader 41287414.load
2017-08-15T12:35:10.258000+02:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2017-08-15T12:35:10.261000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-08-15T12:35:10.261000+02:00 LOG Parsing commands from file #P"/Users/dim/dev/temp/pgloader-issues/stackoverflow/41287414.load"
2017-08-15T12:35:10.422000+02:00 LOG report summary reset
table name read imported errors total time
----------------------- --------- --------- --------- --------------
fetch 0 0 0 0.007s
before load 2 2 0 0.016s
----------------------- --------- --------- --------- --------------
so.raja 1 1 0 0.019s
----------------------- --------- --------- --------- --------------
Files Processed 1 1 0 0.021s
COPY Threads Completion 2 2 0 0.038s
----------------------- --------- --------- --------- --------------
Total import time 1 1 0 0.426s
以下是命令完成时目标表的内容:
$ psql -q -d pgloader -c 'table so.raja'
alphanm │ alphnumnn │ nmrc │ dte │ col
═════════╪═══════════╪══════╪════════════╪════════════════
raj │ B │ 0.5 │ 2017-01-01 │ constant value
(1 row)