这是我的情况:
输入行:
"vijay" <\t> "a-b-c","a-c-d","a-d-c"
"kumar" <\t> "a-b-c","b-c-d""
我创建了这样的表:
hive >create table user_infos(name string, path ARRAY<String> --i need array only)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS
TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE ;
收到的输出:
hive > select * from user_infos ;
"vijay" ["**\"a-b-c\"**","**\"a-c-d\"**","**\"a-d-c\"**"]
"kumar" ["**\"a-b-c\"**","**\"b-c-d\"**"]
问题在于:我不想要双引号,即 \“
必需的输出:
vijay ["a-b-c","a-c-d","a-d-c"]
kumar ["a-b-c","b-c-d"]
为什么没有使用自定义Serde 来实现此目的。像mysql中的ENCLOSED BY一样吗?
答案 0 :(得分:1)
我也遇到了同样的问题,因为我的字段用双引号括起来并用分号(;)分隔。我的表名是employee1。
所以我搜索了链接,我找到了完美的解决方案。
@ ramisetty.vijay:是的,我们必须使用serde。请使用以下链接下载serde jar:https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar
然后使用hive提示符执行以下步骤:
add jar path/to/csv-serde.jar;
create table employee1(id string, name string, addr string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
;
然后使用以下查询从您指定的路径加载数据:
load data local inpath 'path/xyz.csv' into table employee1;
然后运行:
select * from employee1;
感谢。