按组合分组,再次将其分组用于其他项目

时间:2014-05-26 07:19:46

标签: hadoop hive

民间,

我们有以下数据,我们需要以下输出。

 CUSTOMER_NAME PRODUCT_NAME PRICE OCCURANCE ID
 customer1,    product1,    20,       1
 customer1,    product2,    30,       2
 customer1,    product1,    25,       3
 customer1,    product1,    20,       1
 customer1,    product2,    20,       2
 customer1,    product2,    30,       2

首先,我们需要按出现率来平均价格。

 customer1,product1,20 (AVG is 20 for occurance 1), 1
 customer1,product1,25 (AVG is 25 for occurance 3) , 3

现在再一次,我们必须按客户名称,产品名称对其进行平均(在分组中忽略Occurance)

最终输出custoemr1,product1,所有出现的平均价格。

customer1,product1, 20 + 25/2 = 22.5

基本上如何做HIVE的平均值?我们无法为此写任何东西。

1 个答案:

答案 0 :(得分:0)

您可以使用嵌套查询实现,如下所示:

第一步:按events_id

计算价格的初始平均值
SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
FROM customer_info
GROUP BY customer_name,product_name,occurance_id ;

第二步:计算第一步返回的平均值

hive (default)>
              > SELECT customer_name, product_name,avg(avg_of_current_occurance) as final_avg
              > FROM(
              > SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
              > FROM customer_info
              > GROUP BY customer_name,product_name,occurance_id
              > ) W
              > GROUP BY customer_name,product_name;

Total MapReduce jobs = 1
Launching Job 1 out of 1

Execution completed successfully

customer_name   product_name    final_avg
customer1       product1        22.5
customer1       product2        26.666666666666668