Pig脚本用于解析和计算累积值

时间:2015-07-15 21:29:24

标签: parsing apache-pig

输入两列文件:

Visit   ProductString
101      ;Cross Trainers;1;69.95,;Athletic Socks;10;29.99
102      ;Amplifier;1;120.90,;Headphone;2;59.99;leather wallet;1;99.99;

我正在寻找Pig脚本,它可以解析每行中的“ProductString”值并提供累积收入。

ie.,Output:
69.95+29.99+120.90+59.99+99.99=380.82

1 个答案:

答案 0 :(得分:2)

我认为,之后应该有一个59.99;之后不应该有99.99。如果是这样,您需要tokenize上的flatten,提取产品,然后在;上拆分以获取商品价格和数量。

<强>查询

data = LOAD 'db.table';
A = FOREACH data GENERATE visit, FLATTEN(TOKENIZE(product_string, ',')) AS tmp_col;
B = FOREACH A GENERATE visit, STRSPLIT(tmp_col, ';') AS prod;
C = FOREACH B GENERATE visit, prod.$1 AS item:chararray
    , (int)prod.$2 AS qty:int, (double)prod.$3 AS revenue:double;
grpd = GROUP C all;
D = FOREACH grpd GENERATE SUM(C.revenue);
DUMP D;

<强>输出

(380.82)