我有一个input.txt如下所示:
{"charId":1111,"encounters":[{"alias":"A","guid":192,"data1":0,"data2":0,"temporary":1},{"alias":"B","guid":952,"data1":0,"data2":0,"temporary":1}]}
{"charId":2222,"encounters":[{"alias":"C","guid":544,"data1":0,"data2":0,"temporary":1}]}
{"charId":3333,"encounters":[]}
我的问题是如何让输出看起来如下:
(1111, A, 192, 0, 0, 1)
(1111, B, 952, 0, 0, 1)
(2222, C, 544, 0, 0, 1)
(3333, , , , , )
P.S。这是我的脚本,但它只输出前三行。
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
a = FOREACH raw_data GENERATE json#'charId' AS (charId:chararray), FLATTEN(json#'encounters') AS (encounters:map[]);
b = FOREACH a GENERATE charId, encounters#'alias' AS alias, encounters#'guid' AS guid, encounters#'data1' AS data1, encounters#'data2' AS data2, encounters#'temporary' AS temporary;
非常感谢你的帮助。我真的很感激。
答案 0 :(得分:0)
原因是,Flatten
运算符将始终丢弃空映射,因此它不会包含在最终输出中。一种选择是你可以使用以下方法解决这个问题。我不会说这是最好的解决方案,但至少它会解决你的问题。
<强> PigScript:强>
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
a = FOREACH raw_data GENERATE json#'charId' AS (charId:chararray), json#'encounters' AS (encounters:map[]);
b = FOREACH raw_data GENERATE json#'charId' AS (charId:chararray),flatten(json#'encounters') AS (encounters:map[]);
c = FILTER a By IsEmpty(encounters);
d = FOREACH c GENERATE charId,null AS alias,null AS guid,null AS data1,null AS data2,null AS temporary;
e = FOREACH b GENERATE charId, encounters#'alias' AS alias, encounters#'guid' AS guid, encounters#'data1' AS data1, encounters#'data2' AS data2, encounters#'temporary' AS temporary;
f = UNION e,d;
dump f;
<强>输出:强>
(1111,A,192,0,0,1)
(1111,B,952,0,0,1)
(2222,C,544,0,0,1)
(3333,,,,,)