加入两个关系(候选人)并显示投票之间的差异?

时间:2014-04-07 00:52:47

标签: apache-pig

我首先将这种关系分为赢得胜利者和失败者。

我很难将所有候选人一起加入,并使用两个候选人的姓氏(选举和失败)以及他们的投票总数(只有差异小于10的元组)之间的差异制作元组。

 --load the data 
raw = LOAD '.../data2.csv' USING PigStorage(',') AS  (
        date, type:chararray, parl:int, prov:chararray, riding:chararray, 
        lastname:chararray, firstname:chararray, gender:chararray,
        occupation:chararray, party:chararray, votes:int,
        percent:double, elected:int);

fltrd = FILTER raw by votes > 100 ;
spltrd = SPLIT fltrd INTO won IF elected > 0, lost IF elected == 0;
jnd = JOIN won BY lastname AS lastname_won, lost BY lastname AS lastname_lost;

为了显示差异btw投票这是我的想法,但它不起作用:

jnd = JOIN won BY lastname AS lastname_won, vote AS vote_won, lost BY lastname AS lastname_lost, vote AS vote_lost;

gen = foreach jnd generate lastname_won, lastname_lost,(vote_won - vote_lost) as diffVotes; 

1 个答案:

答案 0 :(得分:0)

究竟是什么不起作用?

乍一看,由于您使用的AS,您的加入可能无法正常工作。你不能重命名连接中的列,你需要一个foreach。使用::来区分两个关系中的字段:

jnd2 = FOREACH jnd GENERATE won::lastname AS lastname_won, won::vote AS vote_won etc.

如果您加入多个列,请更改语法,如下所示:

jnd = JOIN won BY (lastname, vote), lost BY (lastname, vote);