我有以下格式的CSV文件
customerid, period, credit, debit
100, jan-2017, 500, 300
100, jan-2017, 300,0
100, feb-2017, 200,100
100, mar-2017, 200,10
200, jan-2017, 100, 200
200, feb-2017,100,200
现在我的要求是首先按客户ID进行分组,然后逐个分组并合并事务并使用Apache Pig脚本创建如下所示的分层JSON。
{
{
"customerid": 100,
"periods": [{
"period": "jan-2017",
"transactions": [{"credit": 500,"debit": 300},....]
}, {
"period": "feb-2017",
"transactions": [...]
}, {
"period": "mar-2017",
"transactions": [....]
}]
}, {
"customerid": 200,
"periods": [{
"period": "jan-2017",
"transactions": [.....]
}, {
"period": "feb-2017",
"transactions": [.....]
}]
}
}
我对Pig很新,但设法编写了以下脚本
Data = LOAD 'data.csv' USING PigStorage(',') AS (
company_id:chararray,
period:chararray,
debit:chararray,
credit:chararray)
CompanyBag = GROUP Data BY (company_id);
final_trsnactionjson = FOREACH CompanyBag {
ByCompanyId = FOREACH Data {
PeriodBag = GROUP Data BY (period);
IdPeriodItemRoot = FOREACH PeriodBag{
ItemRecords = FOREACH Source GENERATE debit as debit, credit as credit
GENERATE group as period, TOTUPLE(ItemRecords) as transactions;
}
}
GENERATE group as customerid, TOTUPLE(PeriodBag) AS periods;
};
但这给了我以下错误
mismatched input '{' expecting GENERATE
我搜索了很多关于如何使用Pig生成嵌套Json,但找不到任何好的指针。我哪里错了?在此先感谢您的帮助
答案 0 :(得分:0)
https://pig.apache.org/docs/r0.11.1/func.html#jsonloadstore 您可以在" AS"
中提供嵌套模式