从json创建hive表

时间:2018-03-09 02:04:06

标签: json hive

我想用Json数组创建hive表 我面临着顶级阵列的问题。谁能建议我一个解决方案。 我的json对象如下所示

  [{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}]

用于创建表的Hive命令:

create external table array_tmp (User array<struct<user_id: String, previous_user_id:String>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

select user.user_id from array_tmp将异常视为

  

Row不是有效的JSON对象。

我添加了jar ADD JAR json-serde-1.3.8-jar-with-dependencies.jar ;  有什么建议吗?

1 个答案:

答案 0 :(得分:1)

您可能需要进行一些更改。这是一个例子

myjson / data.json

{"users":[{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}]}

现在创建一个Hive表

ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

CREATE EXTERNAL TABLE tbl( users array<struct<user_id:string,previous_user_id:string>>) 
ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe" 
location '/user/cloudera/myjson';

做一个选择

select users.user_id from tbl;

+----------------+--+
|    user_id     |
+----------------+--+
| ["a","c","e"]  |
+----------------+--+