我的hive代码中有一个问题。我想从使用HIVE中提取JSON数据。以下是示例json格式
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"versionModified"{"machine":"123.dfer","founder":"3.0","state":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
我想获得以下字段
问题是创始人和州是一个阵列“版本” 任何人都可以帮助如何摆脱这个? 有些时候而不是版本化的其他东西可能来了
例如 有时我的数据会像
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"anotherCriteria":{"engine":"123.dfer","developer":"3.0","state":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
在下面添加一些示例数据:
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC"{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"GAP"{"XVY":"123.dfer","FAH":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"BOX"{"VOG":"123.dfer","FAH":"3.0","FAX":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
我需要将这些数据放入基于版本的各种表中,如果它是“BOX”放在一个表中,如果它是“GAP”放另一个......
答案 0 :(得分:1)
你可以使用json serde来获取所有字段
请按照以下步骤
1.从http://www.congiu.net/hive-json-serde/1.3/
下载json serde2.添加json serde Jar
hive> ADD jar /root/json-serde-1.3-jar-with-dependencies.jar; Added [/root/json-serde-1.3-jar-with-dependencies.jar] to class path Added resources: [/root/json-serde-1.3-jar-with-dependencies.jar]
3.创建表
CREATE TABLE json_serde_table ( Rtype struct<ver:int, os:string,type:string,vehicle:string,MOD: struct<Version:Array<struct<versionModified:struct<machine:string,founder:string,state:string,fashion:string,cdc:string,dof:string,ts:string>>>>> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
4.将json文件加载到表
中hive> load data local inpath '/root/json.txt' INTO TABLE json_serde_table; Loading data to table default.json_serde_table Table default.json_serde_table stats: [numFiles=1, totalSize=234] OK Time taken: 0.877 seconds
5.在查询下面获取结果
hive> select Rtype.ver ver ,Rtype.type type ,Rtype.vehicle vehicle ,Rtype.MOD.version[0].versionModified.ts ts,Rtype.MOD.version[0].versionModified.founder founder,Rtype.MOD.version[0].versionModified.state state from json_serde_table;
Query ID = root_20170412170606_a674d31b-31d7-477b-b9ff-3ebd76636cf8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1491484583384_0018, Tracking URL = http://mac127:8088/proxy/application_1491484583384_0018/
Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job -kill job_1491484583384_0018
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-04-12 17:06:44,990 Stage-1 map = 0%, reduce = 0%
2017-04-12 17:06:53,361 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.8 sec
MapReduce Total cumulative CPU time: 1 seconds 800 msec
Ended Job = job_1491484583384_0018
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.8 sec HDFS Read: 4891 HDFS Write: 50 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 800 msec
OK
1 ns Mh-3412 2000-04-01T00:00:00.171Z 3.0 Florida
Time taken: 19.745 seconds, Fetched: 1 row(s)