当数据本身包含逗号且字段没有引号字符时,如何读取Hive版本0.13中的逗号分隔文件。例 fname,lname,country,city,addr,dob是列名,
tom, kate, USA,CA,los angeles,34 brad street 5thfloor, Jun/23/1975
russel,smith,USA, Tx, 763, grass street, 5th floor, dallas, Jan/31/1999
第一行在数据中没有任何带逗号的列 地址字段中的第二行数据中有逗号 763,草街,5楼,达拉斯
如何在hive 0.13版本中阅读本文
感谢 MX
答案 0 :(得分:2)
假设addr
是唯一可能包含逗号的字段
create external table mydata
(
fname string
,lname string
,country string
,city string
,addr string
,dob string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ("input.regex" = "(.*?),(.*?),(.*?),(.*?),(.*),(.*)")
location '/user/hive/warehouse/mydata'
;
select * from mydata;
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+
| mydata.fname | mydata.lname | mydata.country | mydata.city | mydata.addr | mydata.dob |
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+
| tom | kate | USA | CA | los angeles,34 brad street 5thfloor | Jun/23/1975 |
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+
| russel | smith | USA | Tx | 763, grass street, 5th floor, dallas | Jan/31/1999 |
+--------------+--------------+----------------+-------------+--------------------------------------+-------------+