我拥有的Json数据是:
{"time": "2015-06-30T23:00:00Z",
"type": "analysis",
"revision": "0.8",
"hostname": "iem6.local",
"data": [
{"gid": 1, "tmpc": 28.00, "wawa": [""], "ptype": 10, "dwpc": 17.40, "smps": 6.2, "drct": 99, "vsby": 16.093, "roadtmpc": 39.10,"srad": 77.61, "snwd": 0.00, "pcpn": 0.00},
{"gid": 213840, "tmpc": 22.00, "wawa": [""], "ptype": 10, "dwpc": 13.70, "smps": 5.7, "drct": 350, "vsby": 16.093, "roadtmpc": 32.70,"srad": 249.50, "snwd": 0.00, "pcpn": 0.00}]}
我正在尝试使用Apache Pig的Json Loader加载数据。
data_raw = LOAD '205006.json' using JsonLoader('time:chararray,type:chararray,revision:chararray,hostname:chararray,data:(gid:int,tmpc:float,wawa:{(a:chararray)},ptype:int,dwpc:float)');
但是,转储结果时给出的输出不正确。
(2015-06-30T23:00:00Z,,,,)
(,,,,)
(,,,,)
(,,,,)
(,,,,)
(1,28.00,[,],)
(2,28.00,[,],)
抛出的警告是
2016-10-24 15:43:55,852 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {"time": "2015-06-30T23:00:00Z",
2016-10-24 15:43:55,871 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record "type": "analysis",
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record "revision": "0.8",
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record "hostname": "iem6.local",
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record "data": [
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad tuple field, could not find start of object, field 4
2016-10-24 15:43:55,873 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find end of record {"gid": 1, "tmpc": 28.00, "wawa": [""], "ptype": 10, "dwpc": 17.40, "smps": 6.2, "drct": 99, "vsby": 16.093, "roadtmpc": 39.10,"srad": 77.61, "snwd": 0.00, "pcpn": 0.00},
2016-10-24 15:43:55,873 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad tuple field, could not find start of object, field 4
我无法使用象鸟。
答案 0 :(得分:0)
首先,你应该把你的json加入到同一行。请记住,每行有一个json对象。
其次,使用下面的猪命令:
data_raw = LOAD '205006.json' using JsonLoader('time:chararray,type:chararray,revision:chararray,hostname:chararray,data:{(gid:int,tmpc:float,wawa:{(chararray)},ptype:int, dwpc:float, smps:float, drct:int, vsby:float, roadtmpc:float, srad: float, snwd:float, pcpn:float)}');
您应该按顺序描述json字符串中的所有字段。