我有一个csv文件,其中包含以下格式的数据:
"SomeName1",25,"SomeString1"
"SomeName2",26,"SomeString2"
"SomeName3",27,"SomeString3"
我正在将此CSV加载到配置单元表中。在表格中,第1列和第3列与我不想要的引号一起插入。我希望第1列为SomeName1
,第3列为SomeString1
我试过
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "\""
)
但它不起作用并保留""。
这里的方法应该是什么?
表创建声明:
CREATE TABLE `abcdefgh`(
`name` string COMMENT 'from deserializer',
`age` string COMMENT 'from deserializer',
`value` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='\"',
'separatorChar'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://a-b-c-d-e:9000/user/hive/warehouse/abcdefgh'
TBLPROPERTIES (
'numFiles'='1',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='3134916',
'transient_lastDdlTime'='1490713221')
答案 0 :(得分:4)
您的分隔符应为逗号:"separatorChar" = ','
create external table mytable
(
col1 string
,col2 int
,col3 string
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties
(
"separatorChar" = ','
,"quoteChar" = '"'
)
stored as textfile
;
select * from mytable
;
+--------------+--------------+--------------+
| mytable.col1 | mytable.col2 | mytable.col3 |
+--------------+--------------+--------------+
| SomeName1 | 25 | SomeString1 |
| SomeName2 | 26 | SomeString2 |
| SomeName3 | 27 | SomeString3 |
+--------------+--------------+--------------+