我有一个要求,我需要将Json数据加载到猪,但似乎有一些问题,我无法加载数据。以下是样本数据结构 -
[{
"id": 1,
"first_name": "Lakshmi",
"last_name": "P",
"email": "xxx@yyy.com",
"gender": "Female",
"ip_address": "26.58.193.2"
}, {
"id": 2,
"first_name": "Syam",
"last_name": "Prasad",
"email": "sp@yyy.com",
"gender": "Male",
"ip_address": "229.179.4.212"
}, {
"id": 3,
"first_name": "ABC",
"last_name": "CDE",
"email": "abc@cde.com",
"gender": "Female",
"ip_address": "180.66.162.255"
}, {
"id": 4,
"first_name": "FGS",
"last_name": "IJK",
"email": "lmn@opq.com",
"gender": "Male",
"ip_address": "67.76.188.26"
}]
我尝试使用JsonLoader加载数据,如下面的代码 -
--inidata1 = load 'inputData1.json' using JsonStorage('\n');
--REGISTER 'piggybank-0.15.0.jar';
inidata = load 'inputData1.json' using JsonLoader('id:int,first_name:chararray,last_name:chararray,email:chararray,gender:chararray,ip_address:$
madata = foreach inidata generate group, FLATTEN(inidata);
dump madata;
--filterdata = foreach inidata generate id,first_name,last_name,email,gender,ip_address;
--dump filterdata;
--filterdata = foreach inidata generate id,gender,first_name,last_name;
--selecteddata = filter inidata by (gender=='Male') OR (last_name=='Prasad');
--dump selecteddata;
--store selecteddata into 'JSON-DATA_input';
如果有任何修复,有人可以分享吗?
答案 0 :(得分:0)
JsonLoader期望每行数据都是由换行符分隔的json对象,因此您的数据类似于:
{ "id": 1, "first_name": "Lakshmi", "last_name": "P", "email": "xxx@yyy.com", "gender": "Female", "ip_address": "26.58.193.2"}
{ "id": 2, "first_name": "Syam", "last_name": "Prasad", "email": "sp@yyy.com", "gender": "Male", "ip_address": "229.179.4.212"}
相反,你把它包装在一个数组中 - 我不认为猪的JsonLoader甚至可以加载它而不给它一个键并根据我的测试将它包装在json对象中。
此示例对我有用:http://joshualande.com/read-write-json-apache-pig
此外,如果您发布的代码被正确复制,则格式错误:
inidata = load 'inputData1.json' using JsonLoader('id:int,first_name:chararray,last_name:chararray,email:chararray,gender:chararray,ip_address:$
未正确结束,应该更像是:
inidata = load 'inputData1.json' using JsonLoader('id:int,first_name:chararray,last_name:chararray,email:chararray,gender:chararray,ip_address:chararray');