点击Hadoop Pig中的点击率计算

时间:2016-03-21 20:06:26

标签: hadoop apache-pig

需要计算点击率(点击次数与国家/地区级别的展示次数),我列出了表格结构(展示表和点击表格)以及我在Hadoop Pig中的代码。我的问题是,以下实施是否最有效,更有效的解决方案?感谢。

表格印象:

impressionID,timestamp,countryID

点击表格:

impressionID,时间戳

joined_feed = join impression by impressionID, click by impression ID;
joined_feed = foreach joined_feed generate impression::countryID, click::impressionID is null? 0 : 1 as clicked;
ctr_result = foreach (group joined_feed by country) generate group as countryID, SUM(clicked)/COUNT(joined_feed)

1 个答案:

答案 0 :(得分:1)

你获得ctr的方法非常有效,虽然你应该添加类型转换或者你会得到一堆零和一个

ctr_result = foreach (group joined_feed by country) generate group as countryID, (double) SUM(clicked)/(double) COUNT(joined_feed) as ctr