我有一个input.txt如下所示:
{"item":[{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}],"table_id":62}
{"item":[{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}],"table_id":62}
{"item":[{"sdb_id":107836,"quantity":1}],"table_id":34}
这里我已经加载了input.txt。
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map[]);
item_data = FOREACH raw_data GENERATE json#'item AS (item:{(sdb_id:int, quantity:int)});
DUMP item_data看起来像:
([{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}])
([{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}])
([{"sdb_id":107836,"quantity":1}])
我的问题是如何让输出看起来如下(只有" sdb_id"值和"数量"值):
(107817, 1)
(101733, 1)
(107795, 1)
(107785, 1)
(107836, 1)
非常感谢你的帮助。我真的很感激。
答案 0 :(得分:0)
尝试下面的脚本,我已经改变了JsonLoader来进行嵌套加载 -
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
b = foreach raw_data generate flatten(json#'item') as (k:MAP[]);
c = foreach b generate k#'sdb_id', k#'quantity';
希望这有帮助。