我想用多个CSV文件填充Hive表。问题是并非所有文件都具有相同的分隔符。在表创建中,我只能指定一个分隔符,例如〜
create table status (type string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties ("separatorChar" = "~")
STORED AS TEXTFILE
Hive是否有内置功能,允许多个CSV分隔符?我知道这些文件可以在加载之前由Hadoop作业标准化,或者基于https://stackoverflow.com/a/26356592/2207078我可以使用pig来做它但我正在寻找一些内置功能。理想情况下,我想创建没有指定分隔符的状态表,并指示Hive如何在LOAD上分隔列。
答案 0 :(得分:1)
数据文件
comma.txt
|Now|,I've,heard,there,was
a,secret,chord;,That,David
played,||and||,it,,pleased
the,,,Lord;,
semicolon.txt
But;;you;don't;really
|care|;for;music;do;||||| you |||||?
pipeline.txt
,It,|,goes,|,like,|,this,|,the,
fourth|the|fifth|The|;minor n
fall|the|;major|lift|The
baffled|king||composing|hallelujah
<强> DDL 强>
create external table mytable
(c1 string,c2 string,c3 string,c4 string,c5 string)
partitioned by (delim string)
;
alter table mytable set serdeproperties ('field.delim'=',');
alter table mytable add partition (delim='comma');
alter table mytable set serdeproperties ('field.delim'=';');
alter table mytable add partition (delim='semicolon');
alter table mytable set serdeproperties ('field.delim'='|');
alter table mytable add partition (delim='pipeline');
将文件放在匹配的目录中
mytable
├── delim=comma
│ └── comma.txt
├── delim=pipeline
│ └── pipeline.txt
└── delim=semicolon
└── semicolon.txt
select * from mytable
;
+---------+---------+--------+-----------+------------------+-----------+
| c1 | c2 | c3 | c4 | c5 | delim |
+---------+---------+--------+-----------+------------------+-----------+
| |Now| | I've | heard | there | was | comma |
| a | secret | chord; | That | David | comma |
| played | ||and|| | it | | pleased | comma |
| the | | | Lord; | | comma |
| But | | you | don't | really | semicolon |
| |care| | for | music | do | ||||| you |||||? | semicolon |
| ,It, | ,goes, | ,like, | ,this, | ,the, | pipeline |
| fourth | the | fifth | The | ;minor | pipeline |
| fall | the | ;major | lift | The | pipeline |
| baffled | king | | composing | hallelujah | pipeline |
+---------+---------+--------+-----------+------------------+-----------+