我试图了解如何在Apache Hive中使用rank()over(partition by),但是在获取所需结果时遇到问题。
帖子的底部一直是我正在使用的数据集。
我要做的是提出一份声明,根据该部门员工的工资总额对部门进行唯一排名。但是,我在所有三个部门中都获得了排名1。
希望有人可以告诉我我哪里出问题了!非常感谢! :)
我想要的
+-----------+--------+-----+ | dept_num | _c1 | rk | +-----------+--------+-----+ | 1000 | 24900 | 3 | | 1001 | 17400 | 1 | | 1002 | 20500 | 2 | +-----------+--------+-----+
我得到的东西
+-----------+--------+-----+ | dept_num | _c1 | rk | +-----------+--------+-----+ | 1000 | 24900 | 1 | | 1001 | 17400 | 1 | | 1002 | 20500 | 1 | +-----------+--------+-----+
我正在使用的HiveQL语句
SELECT dept_num, sum(salary), rank() OVER (PARTITION BY dept_num ORDER BY sum(salary)) as rk FROM employee_contract GROUP BY dept_num;
我的数据集
Michael|1000|100|5000|full|2014-01-29 Will|1000|101|4000|full|2013-10-02 Will|1000|101|4000|part|2014-10-02 Steven|1000|102|6400|part|2012-11-03 Lucy|1000|103|5500|full|2010-01-03 Lily|1001|104|5000|part|2014-11-29 Jess|1001|105|6000|part|2014-12-02 Mike|1001|106|6400|part|2013-11-03 Wei|1002|107|7000|part|2010-04-03 Yun|1002|108|5500|full|2014-01-29 Richard|1002|109|8000|full|2013-09-01
答案 0 :(得分:0)
尝试以下一项,未经测试,让我们知道您得到了什么
SELECT dept_num,TOTAL_SALARY,
rank() OVER (ORDER BY TOTAL_SALARY) as rk
FROM
(
SELECT
dept_num, sum(salary) as TOTAL_SALARY
FROM employee_contract
GROUP BY dept_num
)SUM_EMP