使用多字符分隔符创建HIVE表

时间:2013-09-21 10:15:55

标签: hadoop hive

我想创建一个带有多字符串字符的HIVE表作为分隔符,例如

CREATE EXTERNAL TABlE tableex(id INT, name STRING) 
ROW FORMAT delimited fields terminated by ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/myusername';

我希望将分隔符设为多个字符串,如“〜*”。

2 个答案:

答案 0 :(得分:10)

FILELDS TERMINATED BY不支持多字符分隔符。最简单的方法是使用RegexSerDe

CREATE EXTERNAL TABlE tableex(id INT, name STRING) 
ROW FORMAT 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "^(\\d+)~\\*(.*)$"
)
STORED AS TEXTFILE 
LOCATION '/user/myusername';

答案 1 :(得分:6)

请使用 MultiDelimitSerde

CREATE EXTERNAL TABlE tableex(id INT, name STRING) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ("field.delim"="~*")
STORED AS TEXTFILE
LOCATION '/user/myusername';