在Hive 0.13中读取CSV文件,不带引号和逗号数据

时间:2017-02-13 23:07:30

标签: sql regex csv hive cloudera

当数据本身包含逗号且字段没有引号字符时,如何读取Hive版本0.13中的逗号分隔文件。例 fname,lname,country,city,addr,dob是列名,

tom, kate, USA,CA,los angeles,34 brad street 5thfloor, Jun/23/1975
russel,smith,USA, Tx, 763, grass street, 5th floor, dallas, Jan/31/1999 

第一行在数据中没有任何带逗号的列 地址字段中的第二行数据中有逗号 763,草街,5楼,达拉斯

如何在hive 0.13版本中阅读本文

感谢 MX

1 个答案:

答案 0 :(得分:2)

假设addr是唯一可能包含逗号的字段

create external table mydata
(
    fname       string
   ,lname       string
   ,country     string
   ,city        string
   ,addr        string
   ,dob         string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ("input.regex" = "(.*?),(.*?),(.*?),(.*?),(.*),(.*)")
location '/user/hive/warehouse/mydata'
;
select * from mydata;
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+
| mydata.fname | mydata.lname | mydata.country | mydata.city | mydata.addr                          | mydata.dob  |
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+
| tom          | kate         | USA            | CA          | los angeles,34 brad street 5thfloor  | Jun/23/1975 |
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+
| russel       | smith        | USA            | Tx          | 763, grass street, 5th floor, dallas | Jan/31/1999 |
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+