在Hive表中读取JSON数据

时间:2016-11-24 12:47:28

标签: json hadoop hive bigdata

我可以使用JSON Serde org.openx.data.jsonserde.JsonSerDe创建Hive表,但是当我从Hive表中读取数据时,我无法读取。

hive> create table emp (EmpId int , EmpFirstName string , EmpLastName string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
OK
Time taken: 2.148 seconds

hive> LOAD DATA INPATH '/user/cloudera/EmpData/emp.json' INTO table emp;
Loading data to table employee.emp
chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee.db/emp/emp.json': User does not belong to supergroup
Table employee.emp stats: [numFiles=1, totalSize=4163]
OK
Time taken: 1.141 seconds

hive> select * from emp;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1]
Time taken: 0.504 seconds

1 个答案:

答案 0 :(得分:1)

错误:异常失败java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException:行不是有效的JSON对象 - JSONException:JSONObject文本必须以'结尾' }'在2 [字符3第1行]

检查/user/cloudera/EmpData/emp.json中提供的json是否有效

您可以通过

消除无效行
ALTER TABLE table emp SET SERDEPROPERTIES ( "ignore.malformed.json" = "true"); 

检查此链接 - > https://github.com/rcongiu/Hive-JSON-Serde

编辑: 这是无效的json

{ "cols": [ "EmpId", "EmpFirstName", "EmpLastName" ], "data": [ [ 1, "Hannah", "Walton" ], [ 2, "Barrett", "Mendoza" ], [ 3, "Camden", "Kidd" ], [ 4, "Illiana", "Collier" ] ] }

你提供的json

key:cols and value:[ "EmpId", "EmpFirstName", "EmpLastName" ]

key :data and value :[ [ 1, "Hannah", "Walton" ], [ 2, "Barrett", "Mendoza" ], [ 3, "Camden", "Kidd" ], [ 4, "Illiana", "Collier" ]

json应该像

{"EmpId":1,"EmpFirstName":"Hannah","EmpLastName":"Walton"}
{"EmpId":2,"EmpFirstName":"Barrett","EmpLastName":"Mendoza"}
{"EmpId":3,"EmpFirstName":"Camden","EmpLastName":"Kidd"}