我首先将这种关系分为赢得胜利者和失败者。
我很难将所有候选人一起加入,并使用两个候选人的姓氏(选举和失败)以及他们的投票总数(只有差异小于10的元组)之间的差异制作元组。
--load the data
raw = LOAD '.../data2.csv' USING PigStorage(',') AS (
date, type:chararray, parl:int, prov:chararray, riding:chararray,
lastname:chararray, firstname:chararray, gender:chararray,
occupation:chararray, party:chararray, votes:int,
percent:double, elected:int);
fltrd = FILTER raw by votes > 100 ;
spltrd = SPLIT fltrd INTO won IF elected > 0, lost IF elected == 0;
jnd = JOIN won BY lastname AS lastname_won, lost BY lastname AS lastname_lost;
为了显示差异btw投票这是我的想法,但它不起作用:
jnd = JOIN won BY lastname AS lastname_won, vote AS vote_won, lost BY lastname AS lastname_lost, vote AS vote_lost;
gen = foreach jnd generate lastname_won, lastname_lost,(vote_won - vote_lost) as diffVotes;
答案 0 :(得分:0)
究竟是什么不起作用?
乍一看,由于您使用的AS,您的加入可能无法正常工作。你不能重命名连接中的列,你需要一个foreach。使用::来区分两个关系中的字段:
jnd2 = FOREACH jnd GENERATE won::lastname AS lastname_won, won::vote AS vote_won etc.
如果您加入多个列,请更改语法,如下所示:
jnd = JOIN won BY (lastname, vote), lost BY (lastname, vote);