我想用Json数组创建hive表 我面临着顶级阵列的问题。谁能建议我一个解决方案。 我的json对象如下所示
[{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}]
用于创建表的Hive命令:
create external table array_tmp (User array<struct<user_id: String, previous_user_id:String>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
select user.user_id from array_tmp
将异常视为
Row不是有效的JSON对象。
我添加了jar ADD JAR json-serde-1.3.8-jar-with-dependencies.jar ; 有什么建议吗?
答案 0 :(得分:1)
您可能需要进行一些更改。这是一个例子
myjson / data.json
{"users":[{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}]}
现在创建一个Hive表
ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
CREATE EXTERNAL TABLE tbl( users array<struct<user_id:string,previous_user_id:string>>)
ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"
location '/user/cloudera/myjson';
做一个选择
select users.user_id from tbl;
+----------------+--+
| user_id |
+----------------+--+
| ["a","c","e"] |
+----------------+--+