Hive表属性将连续分隔符视为一个分隔符

时间:2017-02-22 09:21:44

标签: hive hiveql regexserde

jan 18 "value1 is null"
feb  4 "value1 is null"

在上面的数据集中,第二行的第1列和第2列之间有连续的分隔符,如何将连续的分隔符作为一个分隔符处理。

1 个答案:

答案 0 :(得分:0)

create external table mydata 
(
    c1 string
   ,c2 string
   ,c3 string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '(".*?"|.*?)\\s+(".*?"|.*?)\\s+(".*?"|.*?)')
location '/user/hive/warehouse/mydata'
;
select * from mydata;
+-----------+-----------+------------------+
| mydata.c1 | mydata.c2 |    mydata.c3     |
+-----------+-----------+------------------+
| jan       |        18 | "value1 is null" |
| feb       |         4 | "value1 is null" |
+-----------+-----------+------------------+