JSON数据如下所示:
{"id":"U101", "name":"Rakesh", "place":{"city":"MUMBAI","state":"MAHARASHTRA"}, "age":20, "occupation":"STUDENT"}
{"id":"","name":"Rakesh", "place":{"city":"MUMBAI","state":"MAHARASHTRA"}, "age":20, "occupation":"STUDENT"}
{"id":"U103", "name":"Rakesh", "place":{"city":"","state":""}, "age":20, "occupation":"STUDENT"}
尝试select
表中的数据时出现以下错误:
hive (ecom)> select * from users_info_raw;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException:
org.codehaus.jackson.JsonParseException: Unexpected character ('2'
(code 50)): was expecting comma to separate OBJECT entries at
[Source: java.io.StringReader@15b0734; line: 1, column: 222]
Time taken: 0.144 seconds
创建表DDL查询:
CREATE TABLE users_info_raw(
> id string,
> name string,
> place struct<city:string,state:string>,
> age INT,
> occupation string
> )
> ROW FORMAT SERDE
> 'com.cloudera.hive.serde.JSONSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
答案 0 :(得分:2)
我使用了hive hcatalog serde,它可以很好地处理您的输入数据。
CREATE TABLE info_raw(
id string,
name string,
place struct<city:string,state:string>,
age INT,
occupation string
)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';