如何在Pig中展平复杂的数据类型?

时间:2014-10-16 01:28:08

标签: apache-pig

我有一个input.txt如下所示:

{"item":[{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}],"table_id":62}
{"item":[{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}],"table_id":62}
{"item":[{"sdb_id":107836,"quantity":1}],"table_id":34}

这里我已经加载了input.txt。

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map[]);

item_data = FOREACH raw_data GENERATE json#'item AS (item:{(sdb_id:int, quantity:int)});

DUMP item_data看起来像:

([{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}])
([{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}])
([{"sdb_id":107836,"quantity":1}])

我的问题是如何让输出看起来如下(只有" sdb_id"值和"数量"值):

(107817, 1)
(101733, 1)
(107795, 1)
(107785, 1)
(107836, 1)

非常感谢你的帮助。我真的很感激。

1 个答案:

答案 0 :(得分:0)

尝试下面的脚本,我已经改变了JsonLoader来进行嵌套加载 -

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
b = foreach raw_data generate flatten(json#'item') as (k:MAP[]);
c = foreach b generate k#'sdb_id', k#'quantity';

希望这有帮助。