我想根据数据上的关键字段将JSON数据从一个表插入到其他表中。
我的数据看起来像这样
{"舍入类型" {"版本":" 1"" OS":" MS&#34 ;, "类型":" NS""车辆":" MH-3412"" MOD" {&# 34;版本":[{" ABC" {" XYZ":" 123.dfer""创始人":&# 34; 3.0"" GHT":"佛罗里达""时尚":" fg45"" CDC&# 34;:"新""自由度":"是"" TS":" 2000-04-01T00:00 :00.171Z"}}]}}}
{"舍入类型" {"版本":" 1"" OS":" MS&#34 ;, "类型":" NS""车辆":" MH-3412"" MOD" {&# 34;版本":[{" GAP" {" XVY":" 123.dfer"" FAH":&# 34; 3.0"" GHT":"佛罗里达""时尚":" fg45"" CDC&# 34;:"新""自由度":"是"" TS":" 2000-04-01T00:00 :00.171Z"}}]}}}
{"舍入类型" {"版本":" 1"" OS":" MS&#34 ;, "类型":" NS""车辆":" MH-3412"" MOD" {&# 34;版本":[{" BOX" {" VOG":" 123.dfer"" FAH":&# 34; 3.0"" FAX":"佛罗里达""时尚":" fg45"" CDC&# 34;:"新""自由度":"是"" TS":" 2000-04-01T00:00 :00.171Z"}}]}}}
这里基于版本,它是" BOX"或" GAP"或" ABC"我想将特定JSON行上的字段填充到另一个表
例如:如果版本是" GAP"然后在一个表中填充特定行,如果它是" BOX"然后填充到另一个表...我的意思是BOX的所有行...
如何使用HIVE实现这一目标。请帮忙。
注意:我的JSON数据在一个表中作为具有类型字符串
的列答案 0 :(得分:2)
<强>演示强>
create table src (myjson string);
insert into src values
('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC":{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
,('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"GAP":{"XVY":"123.dfer","FAH":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
,('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"BOX":{"VOG":"123.dfer","FAH":"3.0","FAX":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
;
create table trg_abc (myjson string);
create table trg_gap (myjson string);
create table trg_box (myjson string);
from src
insert into trg_abc select myjson where get_json_object(myjson,'$.Rtype.MOD.Version[0].ABC') is not null
insert into trg_gap select myjson where get_json_object(myjson,'$.Rtype.MOD.Version[0].GAP') is not null
insert into trg_box select myjson where get_json_object(myjson,'$.Rtype.MOD.Version[0].BOX') is not null
;
答案 1 :(得分:-1)
首先,您需要将数据存储为hive表中的json:
我认为你的蜂巢表是外部的(通常是 - 用SHOW CREATE TABLE your_table
检查)
如果是这样,整个数据集位于某些hdfs / s3路径中,例如s3a://your_bucket/your_jsons_location/
下载json-udf-1.3.7-jar-with-dependencies.jar并运行ADD JARS s3a://your_bucket/lib/json-udf-1.3.7-jar-with-dependencies.jar;
然后,您必须为每个json模式创建一个专用的json表:
CREATE EXTERNAL TABLE boxes
(Rtype struct<ver:string,os:string,type:string,vehicle:string,MOD:struct<Version:array<struct<BOX:struct<VOG:string,FAH:string,FAX:string,fashion:string,cdc:string,dof:string,ts:string>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location 's3a://your_bucket/your_jsons_location/';
CREATE EXTERNAL TABLE gaps
(Rtype struct<ver:string,os:string,type:string,vehicle:string,MOD:struct<Version:array<struct<GAP:struct<XVY:string,FAH:string,GHT:string,fashion:string,cdc:string,dof:string,ts:string>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location 's3a://your_bucket/your_jsons_location/';
CREATE EXTERNAL TABLE abcs
(Rtype struct<ver:string,os:string,type:string,vehicle:string,MOD:struct<Version:array<struct<ABC:struct<XYZ:string,founder:string,GHT:string,fashion:string,cdc:string,dof:string,ts:string>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location 's3a://your_jsons_location/';
现在,如果你要跑:
SELECT * FROM boxes;
SELECT * FROM gaps;
SELECT * FROM abcs;
您将看到每个表只正确解析了匹配的jsons(根据create statment中指定的模式)。 每个表中不匹配的都是NULL。
过滤掉不相关的记录:
SELECT * FROM abcs WHERE Rtype.mod.version[0].abc IS NOT NULL;
注意:这整个解释假设您的jsons存储在hive表的外部(特别是我使用了S3但它也可以是HDFS)