在猪脚本中划分计数

时间:2011-06-24 07:52:29

标签: apache-pig

ch = LOAD 'ch.txt';
ch_all = GROUP ch ALL;
ch_count = FOREACH ch_all GENERATE COUNT(ch);

ca = LOAD 'ca.txt';
ca_all = GROUP ca ALL;
ca_count = FOREACH ca_all GENERATE COUNT(ca);

我有上面的猪脚本代码,它计算两个计数。 现在我想将ch_count除以ca_count并将其存储在一个文件中。 我该怎么做?

1 个答案:

答案 0 :(得分:2)

在Pig中没有方便的方法,但JOIN可以帮助你:

猪:

ch = LOAD 'ch.txt';
ch_all = GROUP ch ALL;
ch_count = FOREACH ch_all GENERATE 'same' AS key, (DOUBLE) COUNT(ch) AS ct;

ca = LOAD 'ca.txt';
ca_all = GROUP ca ALL;
ca_count = FOREACH ca_all GENERATE 'same' AS key, (DOUBLE) COUNT(ca) AS ct;

ca_ch = JOIN ch_count BY key, ca_count BY key;

ca_ch_div = FOREACH ca_ch GENERATE ch_count::ct / ca_count::ct;

DUMP ca_ch_div;

输出:

  

(0.6666666666666666)

输入:

cat ch.txt 
1
2
cat ca.txt 
1
2
3