我正在尝试将一些测试数据加载到一个简单的Hive表中。数据以逗号分隔,但各个元素未用双引号引起来。由于这个原因,我遇到了错误。我如何告诉Hive不要期望varchar字段用引号引起来。手动在varchar字段中添加引号不是一种选择,因为我要使用的输入文件具有数千条记录。下面是示例查询和数据。
create table mydatabase.flights(FlightDate varchar(10),Airline int,FlightNum int,Origin varchar(4),Destination varchar(4),Departure varchar(4),DepDelay double,Arrival varchar(4),ArrivalDelay double,Airtime double,Distance double) row format delimited;
insert into mydatabase.flights(FlightDate,Airline,FlightNum,Origin,Destination,Departure,DepDelay,Arrival,ArrivalDelay,Airtime,Distance)
values(2014-04-01,19805,1,JFK,LAX,0854,-6.00,1217,2.00,355.00,2475.00);
上面的插入查询给我一条错误消息。如果将varchar字段括在引号中,则效果很好。
Error while compiling statement: FAILED: ParseException line 11:11 mismatched input '-' expecting ) near '2014' in value row constructor
我正在使用以下查询加载数据
load data inpath '/user/alpsusa/hive/flights.csv' overwrite into table mydatabase.flights;
加载后,我只看到第一个字段正在加载。其余全部为NULL。
样本数据
2014-04-01,19805,1,JFK,LAX,0854,-6.00,1217,2.00,355.00,2475.00
2014-04-01,19805,2,LAX,JFK,0944,14.00,1736,-29.00,269.00,2475.00
2014-04-01,19805,3,JFK,LAX,1224,-6.00,1614,39.00,371.00,2475.00
2014-04-01,19805,4,LAX,JFK,1240,25.00,2028,-27.00,264.00,2475.00
2014-04-01,19805,5,DFW,HNL,1300,-5.00,1650,15.00,510.00,3784.00