我有一个如下所示的列表:lista.csv:
client-id priority client-start assignment
12345 1 1250125125 13
1246 3 1250122156 27
12616 1 1250122351 3
...
我有另一个列表,看起来像矢量listb.csv:
125125
124214
1246
125
...
我想要做的是过滤所有客户的列表,我的ID也可以在listb中找到。
我尝试了类似的东西,但它不起作用:
raw = LOAD 'lista.csv' USING PigStorage('\t') AS (client-id: int, priority:
int, client-start: int, assignment: int);
s4q = LOAD 'listb.csv' USING PigStorage('\t') AS (survs4id: int);
s4id = FOREACH s4q {
dd = FILTER raw by (client-id == s4q);
GENERATE dd;
}
DUMP dd;
任何想法如何解决这个问题?
答案 0 :(得分:0)
加入这两个关系以仅获取匹配的记录。这将充当过滤器。
raw = LOAD 'lista.csv' USING PigStorage('\t') AS (client-id: int, priority: int, client-start: int, assignment: int);
s4q = LOAD 'listb.csv' USING PigStorage('\t') AS (survs4id: int);
s4id = JOIN raw BY client-id,s4q BY survs4id;
dd = FOREACH s4id GENERATE s4id.$0,s4id.$1,s4id.$2,s4id.$3;
DUMP dd;