为什么PIG FILTER什么都没有?

时间:2014-09-26 16:45:51

标签: filter count apache-pig

嗨有人知道为什么FILTER命令在以下代码中什么都不返回?谢谢你!

data = LOAD 'sample1.txt'
        AS (campaign_id:chararray,
         date:chararray, 
         time:chararray,
         keyword:chararray, 
         display_site:chararray, 
                 placement:chararray, 
         was_clicked:int, 
         cpc:int);

count1 = FOREACH (GROUP data ALL) GENERATE COUNT(data);
DUMP count1;


clicked = FILTER data BY (was_clicked==1);

DUMP clicked;
count2 = FOREACH (GROUP clicked ALL) GENERATE COUNT(clicked);
DUMP count2;

我尝试DUMP data并看到有一些记录(was_clicked == 1)。 DUMP count1显示(100),这是预期的。

DUMP clicked什么也没显示。 DUMP count2没有显示任何内容。

我以本地模式调用.pig文件:$ pig -x local analysis1.pig

1 个答案:

答案 0 :(得分:0)

我没有在脚本中看到任何问题。它的工作正常。你可以粘贴样本输入吗?

input.txt  
aaa,1234,5678,bbb,ccc,ddd,2,100  
zzz,1234,5678,bbb,ccc,ddd,1,100  
xxx,1234,5678,bbb,ccc,ddd,1,100  
yyy,1234,5678,bbb,ccc,ddd,2,100  
jjj,1234,5678,bbb,ccc,ddd,1,100  
kkk,1234,5678,bbb,ccc,ddd,4,100  

PigScript:

data = LOAD 'input.txt' using PigStorage(',')  
        AS (campaign_id:chararray,  
         date:chararray,  
         time:chararray,  
         keyword:chararray,  
         display_site:chararray,  
         placement:chararray,  
         was_clicked:int,  
         cpc:int);  
count1 = FOREACH (GROUP data ALL) GENERATE COUNT(data); 
dump count1; 
clicked = FILTER data BY (was_clicked==1);  
dump clicked;  
count2 = FOREACH (GROUP clicked ALL) GENERATE COUNT(clicked);  
dump count2;  

output of count1:  
(6)  

Output of clicked:  
(zzz,1234,5678,bbb,ccc,ddd,1,100)  
(xxx,1234,5678,bbb,ccc,ddd,1,100)  
(jjj,1234,5678,bbb,ccc,ddd,1,100)  

Output of count2:  
(3)