将Json数据处理成猪

时间:2018-02-21 17:43:22

标签: apache-pig

我有一个要求,我需要将Json数据加载到猪,但似乎有一些问题,我无法加载数据。以下是样本数据结构 -

[{

  "id": 1,

  "first_name": "Lakshmi",

  "last_name": "P",

  "email": "xxx@yyy.com",

  "gender": "Female",

  "ip_address": "26.58.193.2"

}, {

  "id": 2,

  "first_name": "Syam",

  "last_name": "Prasad",

  "email": "sp@yyy.com",

  "gender": "Male",

  "ip_address": "229.179.4.212"

}, {

  "id": 3,

  "first_name": "ABC",

  "last_name": "CDE",

  "email": "abc@cde.com",

  "gender": "Female",

  "ip_address": "180.66.162.255"

}, {

  "id": 4,

  "first_name": "FGS",

  "last_name": "IJK",

  "email": "lmn@opq.com",

  "gender": "Male",

  "ip_address": "67.76.188.26"

}]

我尝试使用JsonLoader加载数据,如下面的代码 -

--inidata1 = load 'inputData1.json' using JsonStorage('\n');
--REGISTER 'piggybank-0.15.0.jar';
inidata = load 'inputData1.json' using JsonLoader('id:int,first_name:chararray,last_name:chararray,email:chararray,gender:chararray,ip_address:$

madata = foreach inidata generate group, FLATTEN(inidata);

dump madata;

--filterdata = foreach inidata generate id,first_name,last_name,email,gender,ip_address;

--dump filterdata;
--filterdata = foreach inidata generate id,gender,first_name,last_name;

--selecteddata = filter inidata by (gender=='Male') OR (last_name=='Prasad');

--dump selecteddata;
--store selecteddata into 'JSON-DATA_input';

如果有任何修复,有人可以分享吗?

1 个答案:

答案 0 :(得分:0)

JsonLoader期望每行数据都是由换行符分隔的json对象,因此您的数据类似于:

{  "id": 1,  "first_name": "Lakshmi",  "last_name": "P",  "email": "xxx@yyy.com",  "gender": "Female",  "ip_address": "26.58.193.2"}
{  "id": 2,  "first_name": "Syam",  "last_name": "Prasad",  "email": "sp@yyy.com",  "gender": "Male",  "ip_address": "229.179.4.212"}

相反,你把它包装在一个数组中 - 我不认为猪的JsonLoader甚至可以加载它而不给它一个键并根据我的测试将它包装在json对象中。

此示例对我有用:http://joshualande.com/read-write-json-apache-pig

此外,如果您发布的代码被正确复制,则格式错误:

inidata = load 'inputData1.json' using JsonLoader('id:int,first_name:chararray,last_name:chararray,email:chararray,gender:chararray,ip_address:$

未正确结束,应该更像是:

inidata = load 'inputData1.json' using JsonLoader('id:int,first_name:chararray,last_name:chararray,email:chararray,gender:chararray,ip_address:chararray');