配置单元CSV行定界符配置

时间:2019-03-11 14:19:11

标签: csv hive

使用Hive在CSV文件上创建外部表时, 您可以使用Hive内部CSV Serde:

...
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '...'
TBLPROPERTIES('serialization.null.format'='')

或OpenCSV Serde:

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = " ", "quoteChar" = '"', "escapeChar" = "\\" )

我的问题是,如果我有这样的CSV文件:

foo,bar,hello\rworld\rbaz,1\n
foo,bar,bye\rworld\rbaz,2\n
foo,bar,hi\rworld\rbaz,3\n
foo,bar,goodbye\rworld\rbaz,4\n

如何将行尾配置为\n并忽略\r-将其保留为字段的一部分?


编辑:

->尝试使用LINES TERMINATED BY '\r\n'时发生以下错误:

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException 3:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\r\n''

1 个答案:

答案 0 :(得分:0)

您可以在LINES TERMINATED BY语句中使用create table,如下所示:

...
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '...'
TBLPROPERTIES('serialization.null.format'='')