蜂巢SQL内部联接-如何在同一查询中获取sum和row_num

时间:2018-10-20 17:27:35

标签: hive hiveql

我有2个表格产品和商品,表格中的数据如下所示

select * from psales;
+-------------+---------------+--+
| psales.pid  | psales.sales  |
+-------------+---------------+--+
| 1           | 100           |
| 1           | 150           |
| 1           | 200           |
| 2           | 75            |
| 2           | 45            |
| 2           | 145           |
| 3           | 176           |
| 3           | 99            |
| 1           | 27            |
| 4           | 51            |
+-------------+---------------+--+

select * from product;
+--------------+----------------+--+
| product.pid  | product.pname  |
+--------------+----------------+--+
| 1            | p1             |
| 2            | p2             |
| 3            | p3             |
| 4            | p4             |
+--------------+----------------+--+

目标是获得组合销售额第二高的产品。

这是我当前用于获取组合销售额最高(正常工作)的产品的查询

select p1.pname, p1.total_sales
from (select p.pid as pid, p.pname as pname, s.sales as sales,  
      sum(s.sales) over (partition by p.pid order by p.pid) as total_sales 
      from product p
      inner join psales s on (p.pid = s.pid) 
      order by total_sales desc) p1 
limit 1;

我如何获得销售总额第二高的产品?

当我尝试在内部查询中获取row_num时,出现以下错误:

select p1.pname as pname, p1.total_sales as total_sales, row_num() over (partition by pname order by pname) as rownum 
from (select p.pid as pid, p.pname as pname, s.sales as sales,  
      sum(s.sales) over (partition by p.pid order by p.pid) as total_sales, 
      row_num() over (partition by p.pid) as rownum 
      from product p 
      inner join psales s on (p.pid = s.pid) 
      order by total_sales desc) p1 
where rownum =2;
  

错误:编译语句时出错:失败:SemanticException无法将窗口调用分解为组。至少一组必须仅取决于输入列。还要检查循环依赖性。           潜在错误:无效的函数row_num(状态= 42000,代码= 40000)

谢谢您的帮助。

2 个答案:

答案 0 :(得分:0)

在上部子查询中使用row_number()函数。看来您不需要解析sum(),简单的group by就可以做到:

select p1.pname, p1.pid, p1.total_sales
  from
(
select p1.pname, p1.pid, p1.total_sales, 
       row_number() over (order by total_sales  desc) rn
  from 
     (select p.pid, p.pname, sum(s.sales) as total_sales 
        from product p 
             inner join psales s on p.pid = s.pid
        group by p.pid, p.pname
     )p1
)s
where rn=2
;

如果要选择所有具有相同销售额的产品,请使用dense_rank()代替row_number()

答案 1 :(得分:0)

您可以使用dense_rank对每个pid名称组合的总和进行排名。

select p1.pname,p1.pid,p1.total_sales
from (select p.pid, p.pname,sum(s.sales) as total_sales,
      dense_rank() over(order by sum(s.sales) desc) as rnk 
      from product p 
      join psales s on p.pid = s.pid
      group by p.pid,p.pname
     ) p1
where rnk=2