民间,
我们有以下数据,我们需要以下输出。
CUSTOMER_NAME PRODUCT_NAME PRICE OCCURANCE ID
customer1, product1, 20, 1
customer1, product2, 30, 2
customer1, product1, 25, 3
customer1, product1, 20, 1
customer1, product2, 20, 2
customer1, product2, 30, 2
首先,我们需要按出现率来平均价格。
customer1,product1,20 (AVG is 20 for occurance 1), 1
customer1,product1,25 (AVG is 25 for occurance 3) , 3
现在再一次,我们必须按客户名称,产品名称对其进行平均(在分组中忽略Occurance)
最终输出custoemr1,product1,所有出现的平均价格。
customer1,product1, 20 + 25/2 = 22.5
基本上如何做HIVE的平均值?我们无法为此写任何东西。
答案 0 :(得分:0)
您可以使用嵌套查询实现,如下所示:
第一步:按events_id
计算价格的初始平均值SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
FROM customer_info
GROUP BY customer_name,product_name,occurance_id ;
第二步:计算第一步返回的平均值
hive (default)>
> SELECT customer_name, product_name,avg(avg_of_current_occurance) as final_avg
> FROM(
> SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
> FROM customer_info
> GROUP BY customer_name,product_name,occurance_id
> ) W
> GROUP BY customer_name,product_name;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Execution completed successfully
customer_name product_name final_avg
customer1 product1 22.5
customer1 product2 26.666666666666668