如果列具有列表中的值,则Pig脚本将提取行

时间:2017-04-20 07:01:42

标签: hadoop apache-pig

我有像这样的猪脚本

a = load 'large_file' using PigStorage(',')
b = filter a by $16 = '12345678'
c = filter a by $16 = '456'
d = union b,c 
store d into 'output.csv' 

如果我想按值列表过滤a。例如,我想在第16列具有值大的值列表中时提取所有行。

在熊猫术语中它将是

df[df['col'].isin([one massive list])] 

我正在使用猪版0.8

1 个答案:

答案 0 :(得分:1)

对于Pig-0.8,在FILTER中使用多个 OR

b = filter a by $16 == '12345678' OR $16 == '456' OR $16 == 'anotherval';

从Pig-0.12.0开始,您可以使用 IN 运算符

b = filter a by $16 IN ('12345678', '456', ... );