我正在尝试将73个本地文件加载到redshift上。数据没有常见的分隔符,例如逗号或制表符。相反,分隔符是13个空格。有没有办法将这些空格视为分隔符?
我使用AWS文档中的相同示例。实际数据如下所示:
1 ToyotaPark Bridgeview IL
2 ColumbusCrewStadium Columbus OH
3 RFKStadium Washington DC
4 CommunityAmericaBallpark KansasCity KS
5 GilletteStadium Foxborough MA
6 NewYorkGiantsStadium EastRutherford NJ
7 BMOField Toronto ON
8 TheHomeDepotCenter Carson CA
9 Dick'sSportingGoodsPark CommerceCity CO
10 PizzaHutPark Frisco TX
示例代码:
create table venue_new(
venueid smallint not null,
venuename varchar(100) not null,
venuecity varchar(30),
venuestate char(2),
venueseats integer not null default '1000');
copy venue_new(venueid, venuename, venuecity, venuestate)
from 's3://mybucket/data/venue_noseats.txt'
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'
delimiter ' ';
实际数据大约有80列,宽度不同。好处是每个数据元素都没有空间。而不是为每列指定固定宽度。是否有更简单的方法将数据分隔13个空格?
答案 0 :(得分:1)
复制命令仅允许单字符分隔符,因此您无法将此数据直接导入目标表。相反,您需要创建一个临时表:
create table stage_venue (venue_record varchar(200));
运行你的复制命令(假设你的数据中没有管道,|,字符):
copy stage_venue from 's3://mybucket/data/venue_noseats.txt' credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';
然后使用split命令填充目标表(请注意,我的样本中只计算了10个空格而不是13个):
insert into venue_new (venueid, venuename, venuecity, venuestate), select split_part(venue_record,' ',1),split_part(venue_record,' ',2),split_part(venue_record,' ',3),split_part(venue_record,' ',4) from stage_venue;