PIG:求和除法,创建一个对象

时间:2014-11-18 04:58:11

标签: sum apache-pig divide

我正在编写一个猪程序,它加载一个文件,用于分隔其entires和tabs

ex:name TAB year TAB count TAB ...

file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);

-- Group by type
grouped = GROUP file BY type;

-- Flatten
by_type = FOREACH grouped GENERATE FLATTEN(group) AS (type, year, match_count, volume_count);

group_operat = FOREACH by_type GENERATE  
        SUM(match_count) AS sum_m,
        SUM(volume_count) AS sum_v,
       (float)sum_m/sm_v;

DUMP group_operat;

问题在于我正在尝试创建的组操作对象。 我想要将所有匹配计数相加,将所有体积计数相加并将匹配计数除以体积计数

在算术运算/对象创建中我做错了什么? 我收到的错误是第7行,第11列> pig脚本无法验证:org.apache.pig.impl.logicalLayer.FrontendException:错误1031:不兼容的架构:左边是“type:NULL,year:NULL,match_count:NULL,volume_count:NULL”,右边是“group:chararray “

谢谢。

2 个答案:

答案 0 :(得分:2)

尝试这样,这将返回类型和总和。

更新了工作代码

<强> input.txt中

A       2001     10      2
A       2002     20      3
B       2003     30      4
B       2004     40      1

<强> PigScript:

file = LOAD 'input.txt' USING PigStorage() AS (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY type;
group_operat = FOREACH grouped {
                                 sum_m = SUM(file.match_count);
                                 sum_v = SUM(file.volume_count);
                                 GENERATE group,(float)(sum_m/sum_v) as sum_mv;
                                }
DUMP group_operat;

<强>输出:

(A,6.0)
(B,14.0)

答案 1 :(得分:1)

试试这个,

file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);

grouped = GROUP file BY (type,year);

group_operat = FOREACH grouped GENERATE group,
        SUM(file.match_count) AS sum_m,
        SUM(file.volume_count) AS sum_v,
       (float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;

上面的脚本按类型和年份给出结果组,如果您只想按类别分组,则从分组中删除

grouped = GROUP file BY type;

group_operat = FOREACH grouped GENERATE group,file.year,
        SUM(file.match_count) AS sum_m,
        SUM(file.volume_count) AS sum_v,
       (float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;