Pig JsonLoader问题 - 没有正确解析自定义json

时间:2014-03-06 18:31:38

标签: mapreduce apache-pig

我是猪的新手,我正在尝试使用以下结构解析json

{"id1":197,"id2":[ 
    {"id3":"109.11.11.0","id4":"","id5":1391233948301},
    {"id3":"10.10.15.81","id4":"","id5":1313393100648},
    ...
]}

上述文件名为jsonfile.txt

alias = load 'jsonfile.txt' using JsonLoader('id1:int,id2:[id3:chararray,id4:chararray,id5:chararray]');

这是我得到的错误。

错误org.apache.pig.tools.grunt.Grunt - 错误1200:错误的输入'id3'期待RIGHT_BRACKET

你知道我可能做错了吗?

1 个答案:

答案 0 :(得分:1)

您的JSON架构格式不正确。

复杂数据类型的格式如下所示:

Tuple: enclosed by (), items separated by ","
    Non-empty tuple: (item1,item2,item3)
    Empty tuple is valid: ()
Bag: enclosed by {}, tuples separated by ","
    Non-empty bag: {code}{(tuple1),(tuple2),(tuple3)}{code}
    Empty bag is valid: {}
Map: enclosed by [], items separated by ",", key and value separated by "#"
    Non-empty map: [key1#value1,key2#value2]
    Empty map is valid: []

来源:http://pig.apache.org/docs/r0.10.0/func.html#jsonloadstore

换句话说,[]不是数组,它们是关联表(地图),其中关键字符是“#”来分割键和值。尝试使用元组(括号)。

'id1:int,id2:(id3:chararray,id4:chararray,id5:chararray)'

OR

'id1:int,id2:{(id3:chararray,id4:chararray,id5:chararray)}'

我无法测试它并且从未尝试过Pig,但根据文档,它应该可以正常工作。

(基于以下示例)

a = load 'a.json' using JsonLoader('a0:int,a1:{(a10:int,a11:chararray)},a2:(a20:double,a21:bytearray),a3:[chararray]');