如何使用sql loader加载不一致的CSV文件?

时间:2019-01-14 06:30:44

标签: oracle sql-loader

我有以下示例csv文件

,,,Test File,
,todays Date:,01/10/2018,Generation date,10/01/2019 11:20:58
Header 1,Header 2,Header 3,Header 4,Header 5
,My account no,100102GFC,,
A,B,C,D,E
A,B,C,D,E
A,B,C,D,E

下面是我的表结构

Todays Date,My account,Header 1,Header 2,Header 3,Header 4,Header 5
01/10/2018,100102GFC,A,B,C,D,E
01/10/2018,100102GFC,A,B,C,D,E
01/10/2018,100102GFC,A,B,C,D,E

我在从文件第二行获取今天的日期和从文件第四行获取帐号时遇到问题。前四行将保持一致。我的实际数据从第5行开始。

是否可以从第二行和第四行获取特定内容,并将其与从第五行开始的其他值一起加载?我们如何在控制文件中处理此问题?

2 个答案:

答案 0 :(得分:1)

您可以有条件地将记录加载到不同的表中。因此,您可以通过以下方式实现此效果:

  • 创建三个登台表:加载日期,帐户日期和加载数据
  • 将记录加载到适当的表中
  • 交叉加入结果以获取输出

例如,创建以下登台表:

create table t (
  c1 varchar2(1),
  c2 varchar2(1),
  c3 varchar2(1),
  c4 varchar2(1),
  c5 varchar2(1)
);

create table dt (
  load_date date
);

create table act (
  acct# varchar2(20)
);

然后使用以下控制文件来说明何时将哪些记录加载到每个表中:

LOAD DATA
infile *
TRUNCATE 
INTO TABLE dt WHEN (2:13) = 'todays Date:'
FIELDS TERMINATED BY ","
DATE FORMAT "DD/MM/YYYY"
TRAILING NULLCOLS
(
c1 filler, c2 filler, load_date date, c4 filler, c5 filler
)
INTO TABLE act WHEN (2:14) = 'My account no'
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(
c1 filler position(1:1), c2 filler, acct#, c4 filler, c5 filler
)
INTO TABLE t WHEN (1:1) <> ','
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(
c1 position(1), c2, c3, c4, c5 
)
BEGINDATA
,,,Test File,
,todays Date:,01/10/2018,Generation date,10/01/2019 11:20:58
Header 1,Header 2,Header 3,Header 4,Header 5
,My account no,100102GFC,,
A,B,C,D,E
A,B,C,D,E
A,B,C,D,E

when子句使用位置表示法检查给定字符。根据需要进行调整。列子句中的填充符表示忽略此字段。

现在加载它:

sqlldr userid=chris/chris@db control=sqlldr.ctl

SQL*Loader: Release 12.2.0.1.0 - Production on Mon Jan 14 10:56:40 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

Path used:      Conventional
Commit point reached - logical record count 7

Table DT:
  1 Row successfully loaded.

Table ACT:
  1 Row successfully loaded.

Table T:
  3 Rows successfully loaded.

Check the log file:
  sqlldr.log
for more information about the load.

所有您需要做的就是将这些交叉交叉在一起以获得所需的结果:

select * from dt
cross  join act
cross  join t;

LOAD_DATE           ACCT#       C1   C2   C3   C4   C5   
01-OCT-2018 00:00   100102GFC   A    B    C    D    E    
01-OCT-2018 00:00   100102GFC   A    B    C    D    E    
01-OCT-2018 00:00   100102GFC   A    B    C    D    E 

这有点混乱。如果您能够将文件传输到数据库服务器,则使用外部表会更容易。

答案 1 :(得分:0)

处理负载的一种常用方法也是将所有行加载到登台表中。例如,您可以创建一个包含5个varchar2列(数据中最大的列数)的登台表。截断并将所有行原样加载到登台表中。

enter image description here

然后创建一个PL / SQL脚本以供下一步运行,该脚本将登台表中的数据加载到生产表中,并进行验证和转换。将其作为存储过程。

declare
  save_date date;
  save_acct_nbr varchar2(10);
begin

    execute immediate 'truncate table x_test';

    -- Save the file date
    select to_date(col3, 'MM/DD/YYYY')
    into save_date
    from X_TEST_STG
    where col2 = 'todays Date:';

    -- Save the account number
    select col3
    into save_acct_nbr
    from X_TEST_STG
    where col2 = 'My account no';

    insert into x_test
    (select save_date, save_acct_nbr, col1, col2, col3, col4, col5
     from X_TEST_STG
     where col1 is not null
     and col1 != 'Header 1');

    commit;
end;

Badda Bing,Badda Boom!

enter image description here