我正在尝试在hive中构建一个表来跟随json
{
"business_id": "vcNAWiLM4dR7D2nwwJ7nCA",
"hours": {
"Tuesday": {
"close": "17:00",
"open": "08:00"
},
"Friday": {
"close": "17:00",
"open": "08:00"
}
},
"open": true,
"categories": [
"Doctors",
"Health & Medical"
],
"review_count": 9,
"name": "Eric Goldberg, MD",
"neighborhoods": [],
"attributes": {
"By Appointment Only": true,
"Accepts Credit Cards": true,
"Good For Groups": 1
},
"type": "business"
}
我可以使用以下DDL创建一个表,但是在查询该表时会出现异常。
CREATE TABLE IF NOT EXISTS business (
business_id string,
hours map<string,string>,
open boolean,
categories array<string>,
review_count int,
name string,
neighborhoods array<string>,
attributes map<string,string>,
type string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
检索数据时的异常是&#34; ClassCast:无法将jsoanarray转换为json对象&#34; 。这个json的正确架构是什么?是否有任何可以帮助我生成正确的模式给定json与jsonserde一起使用?
答案 0 :(得分:4)
我认为问题是hours
,您定义为hours map<string,string>
,但应该是map<string,map<string,string>
。
您可以使用一种工具从JSON数据自动生成配置单元表定义:https://github.com/quux00/hive-json-schema
但你可能想要调整它,因为当遇到JSON对象({}之间的任何东西)时,工具无法知道将其转换为配置单元map
或struct
。
在您的数据上,该工具为我提供了这个:
CREATE TABLE x (
attributes struct<accepts credit cards:boolean,
by appointment only:boolean, good for groups:int>,
business_id string,
categories array<string>,
hours map<string:struct<close:string, open:string>
name string,
neighborhoods array<string>,
open boolean,
review_count int,
type string
)
但看起来你想要这样的东西:
CREATE TABLE x (
attributes map<string,string>,
business_id string,
categories array<string>,
hours map<string,struct<close:string, open:string>>,
name string,
neighborhoods array<string>,
open boolean,
review_count int,
type string
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
hive> load data local inpath 'json.data' overwrite into table x;
hive> Table default.x stats: [numFiles=1, numRows=0, totalSize=416,rawDataSize=0]
OK
hive> select * from x;
OK
{"accepts credit cards":"true","by appointment only":"true",
"good for groups":"1"}
vcNAWiLM4dR7D2nwwJ7nCA
["Doctors","Health & Medical"]
{"tuesday":{"close":"17:00","open":"08:00"},
"friday":{"close":"17:00","open":"08:00"}}
Eric Goldberg, MD ["HELLO"] true 9 business
Time taken: 0.335 seconds, Fetched: 1 row(s)
hive>
虽然有几点说明:
attributes
可以是结构,但您需要使用accepts credit cards
中的空格映射名称。我的SerDe允许将json属性映射到不同的hive列名。这也是必需的,然后JSON使用属于hive关键字的属性,如'timestamp'或'create'。