在PIG中加载包含数组的JSON数据

时间:2017-09-11 12:45:10

标签: arrays json apache-pig

我有格式的JSON文件:

{"id": "59b6808364fdb09cde10ad3b","balance": "$1,972.02","age": 35,"eyeColor": "green","tags": ["aute","nostrud","pariatur","adipisicing","irure"]}
{"id": "59b6808334cd60be95e5c166","balance": "$3,697.85","age": 32,"eyeColor": "blue","tags": ["tempor","non","ad","adipisicing","ut"]}
{"id": "59b680834544a828191abc88","balance": "$1,102.43","age": 38,"eyeColor": "brown","tags": ["quis","non","ut","veniam","ipsum"]}

我需要将这些数据加载到猪身上。我正在使用:

raw_data = LOAD '/path/to/file' USING JsonLoader('id:chararray, balance:chararray, age:int, eyeColor:chararray, tags:chararray')

使用dump raw_data;

时,我无法使用此结果获得正确的结果

在Apache PIG中加载数组的正确数据类型是什么? 还有另一个question提到如何扩展数组但是对于我的情况,我可以在tags元素中包含可变元素。

即使我可以将数组转换为字符串然后加载它也没关系。

1 个答案:

答案 0 :(得分:1)

使用{}

将字段括在标记内
raw_data = LOAD '/path/to/file' USING JsonLoader('id:chararray, balance:chararray, age:int, eyeColor:chararray, tags:{items:chararray}')