计算HIVE中两行之间的工资差异

时间:2017-11-12 12:11:52

标签: hive hiveql

我有一张下面列的表格 -

last_name,    first_name,  department,     salary    

我想计算一份收入低于100的员工名单 对同一部门薪水较高的直接雇员。我去了下面的答案 - Compute differences between succesive records in Hadoop with Hive Queries然后尝试了,但我认为我做错了,因为我不熟悉HIVE。

以下是我正在运行的查询 -

select last_name,first_name, salary from emp where 
100 = LEAD(salary,1) OVER(PARTITION BY department ORDER BY salary)-salary;

请帮我解决问题。

3 个答案:

答案 0 :(得分:0)

使用case表达式。

 SELECT last_name,
       first_name,
       salary
FROM   (SELECT last_name,
               first_name,
               salary,
               CASE
                 WHEN 100 > LEAD(salary, 1)
                              OVER(
                                PARTITION BY department
                                ORDER BY salary) - salary THEN 1
                 ELSE 0
               END sal_flag
        FROM   emp)
WHERE  sal_flag = 1;  

答案 1 :(得分:0)

Hive强制为每个子查询指定名称。我刚刚将这个名字添加到了Kaushik的查询中。试试这个,它会起作用。

SELECT last_name,
       first_name,
       salary
FROM   (SELECT last_name,
               first_name,
               salary,
               CASE
                 WHEN 100 > LEAD(salary, 1)
                              OVER(
                                PARTITION BY department
                                ORDER BY salary) - salary THEN 1
                 ELSE 0
               END sal_flag
        FROM   employee) v
WHERE  sal_flag = 1; 

我个人更喜欢使用WITH子句而不是子查询,如下所示。使用子句使查询更具可读性。此外,它们通常会产生更好的执行计划。

WITH sal_view 
AS (SELECT last_name,
               first_name,
               salary,
               CASE
                 WHEN 100 > LEAD(salary, 1)
                              OVER(
                                PARTITION BY department
                                ORDER BY salary) - salary THEN 1
                 ELSE 0
               END sal_flag
        FROM   employee) 
SELECT last_name,
       first_name,
       salary
FROM  sal_view
WHERE  sal_flag = 1;  

答案 2 :(得分:0)

尝试

    with temp as(
select last_name,
   first_name,
   department,
   salary,
   LEAD(salary, 1)
          OVER(                             PARTITION BY department
                    ORDER BY salary) as diff

    FROM   emp
    )

    select ast_name,
   first_name,
   department,
   salary
   from temp
    where diff >100