我无法将嵌套的JSON数据加载到Hive表中。有人可以帮帮我吗?以下是我的尝试:
示例输入:
{"DocId":"ABC","User1":{"Id":1234,"Username":"sam1234","Name":"Sam","ShippingAddress":{"Address1":"123 Main St.","Address2":null,"City":"Durham","State":"NC"},"Orders":[{"ItemId":6789,"OrderDate":"11/11/2012"},{"ItemId":4352,"OrderDate":"12/12/2012"}]}}
On Hive(CDH3):
ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;
CREATE TABLE json_tab(
DocId string,
user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>>
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
STORED AS TEXTFILE;
hive> select * from json_tab;
OK
NULL null
我在这里得到NULL
。
还尝试使用HCatalog jar:
ADD JAR /home/training/Desktop/hcatalog-core-0.11.0.jar;
CREATE TABLE json_tab(
DocId string,
user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
但是我的create table
声明面临以下错误:
有人可以帮帮我吗?感谢您的帮助。失败:元数据错误:无法验证serde: org.apache.hive.hcatalog.data.JsonSerDe FAILED:执行错误, 从org.apache.hadoop.hive.ql.exec.DDLTask返回代码1
答案 0 :(得分:3)
你可以使用 org.openx.data.jsonserde.JsonSerDe 类来辐射json数据
您可以从http://www.congiu.net/hive-json-serde/1.3.6-SNAPSHOT/cdh4/
下载jar文件并执行以下步骤
add jar /path/to/jar/json-serde-1.3.6-jar-with-dependencies.jar;
CREATE TABLE json_tab(
DocId string,
user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
LOAD DATA LOCAL INPATH '/path/to/data/nested.json' INTO TABLE json_tab;
SELECT DocId, User1.Id, User1.ShippingAddress.City as city,
User1.Orders[0].ItemId as order0id,
User1.Orders[1].ItemId as order1id from json_tab;
result
ABC 1234 Durham 6789 4352
答案 1 :(得分:0)
I was getting same exception.
我添加了以下罐子,它对我有用。
ADD JAR /home/cloudera/Data/json-serde-1.3.7.3.jar;
ADD JAR /home/cloudera/Data/hive-hcatalog-core-0.13.0.jar;
答案 2 :(得分:0)
使用 HiveQL 分析 JSON 文件需要 org.openx.data.jsonserde.JsonSerDe
或 org.apache.hive.hcatalog.data.JsonSerDe
才能正常工作。
org.apache.hive.hcatalog.data.JsonSerDe
这是默认的 JSON SerDe from Apache。这通常用于处理事件等 JSON 数据。这些事件表示为由新行分隔的 JSON 编码文本块。 Hive JSON SerDe 不允许映射或结构键名称中的重复键。
org.openx.data.jsonserde.JsonSerDe
OpenX JSON SerDe 类似于原生的 Apache;但是,它提供了多个可选属性,例如“ignore.malformed.json”、“case.insensitive”等等。在我看来,它通常在处理嵌套的 JSON 文件时效果更好。
参见下面的工作示例:
CREATE EXTERNAL TABLE IF NOT EXISTS `dbname`.`tablename` (
`DocId` STRING,
`User1` STRUCT<
`Id`:INT,
`Username`:STRING,
`Name`:STRING,
`ShippingAddress`:STRUCT<
`Address1`:STRING,
`Address2`:,
`City`:STRING,
`State`:STRING>,
`Orders`:STRUCT<
`ItemId`:INT,
`OrderDate`:STRING>>)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
LOCATION
's3://awsexamplebucket1-logs/AWSLogs/'
从以下生成的创建表语句:https://www.hivetablegenerator.com/