我正在将数据加载到数据库本身包含逗号的hive表中。
input file:emp.csv
101,deepak,kumar,das
102,sumita,kumari,das
103,rajesh kumar das
output :
id name
101 deepak kumar das
102 sumita kumari das
103 rajesh kumar das
当我创建下面的hive表并加载数据时,数据不正确:
create external table hive_test(
id int, name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/hive_demo';
load data local inpath '/home/cloudera/hadoop/hive_demo/emp.csv' overwrite into table hive_test;
hive> select * from hive_test;
101 deepak
102 sumita
103 rajesh kumar das
所以我创建了下表,但它给出了错误。
create external table hive_test1(
id int,
name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES(
"separatorChar" = ",",
"quoteChar" = "'",
"escapeChar" = "\,")
STORED AS TEXTFILE
LOCATION '/hive_demo';
load data local inpath '/home/cloudera/hadoop/hive_demo/emp.csv' overwrite into table hive_test1;
select * from hive_test1;
Failed with exception
java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException:
java.lang.UnsupportedOperationException: The separator, quote, and escape characters must be different!
如何将数据加载到Hive表?
答案 0 :(得分:0)
在假设下面提供解决方案:
name
列中的任何','字符替换为空格。
create external table hive_test(
id int, name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "^(\d+),(.*)$" -- 2 regex groups as per assumption
)
STORED AS TEXTFILE;
LOCATION '/path/to/table';
LOAD data local inpath '/path/to/local/csv' overwrite into table hive_test;
name
列中的','替换为空格
create table hive_test1 as
select id, regexp_replace(name, ',', ' ') as name
from hive_test;
然后,在select * from hive_test1
上,您将获得以下内容:
101 deepak kumar das
102 sumita kumari das
103 rajesh kumar das