需要计算点击率(点击次数与国家/地区级别的展示次数),我列出了表格结构(展示表和点击表格)以及我在Hadoop Pig中的代码。我的问题是,以下实施是否最有效,更有效的解决方案?感谢。
表格印象:
impressionID,timestamp,countryID
点击表格:
impressionID,时间戳
joined_feed = join impression by impressionID, click by impression ID;
joined_feed = foreach joined_feed generate impression::countryID, click::impressionID is null? 0 : 1 as clicked;
ctr_result = foreach (group joined_feed by country) generate group as countryID, SUM(clicked)/COUNT(joined_feed)
答案 0 :(得分:1)
你获得ctr的方法非常有效,虽然你应该添加类型转换或者你会得到一堆零和一个
ctr_result = foreach (group joined_feed by country) generate group as countryID, (double) SUM(clicked)/(double) COUNT(joined_feed) as ctr