Hive - 具有多行的sql max数

时间:2015-12-08 16:03:42

标签: hive hiveql

对于下面的原始数据,如何获得整个行的每个customer_id的最大数量,并为该行的其余部分获取null?我可以获得数据的最大值但不能以#Results

的形式获取
#Raw data                           
customer_id  name       location    itemno_1    itemno_2    itemno_3    itemno_4    itemno_5
123          Ashley M   CA          10          null        10       null   null
123          Ashley M   CA          null        12          null        12  null
143          Donald P   FL          15          15          0   1   10
187          Alicia P   GA          15          9           null    null    null
1736         Mike H     CT          null        8           8   9        null
1736         Mike H     CT          null        null       null null         null
1876         David M    CA          null        null       null null         null
532          Matthew T  CA          null        9          10   10  null

结果

customer_id  name       location    itemno_1    itemno_2    itemno_3    itemno_4    itemno_5
123          Ashley M   CA  null    12  null    null    null
143          Donald P   FL  15  null    null    null    null
187          Alicia P   GA  15  null    null    null    null
1736         Mike H     CT  null    null    null    null    null
1876         David M    CA  null    null    null    null    null
532          Matthew T  CA  null    null    null    10  null

1 个答案:

答案 0 :(得分:1)

以下是产生预期结果的查询。(我已经测试过它)我假设如果2个item_nos具有相同的最大值,我们将保持最低item_no的值。例如,对于customer_id = 123 itemno_2和itemno_4的值为12,但将itemno_2保留为12,并将itemno_4设为null。

select customer_id, name, location1
      ,CASE WHEN (i1 >= i2 or i2 is null)
            AND  (i1 >= i3 or i3 is null)
            AND  (i1 >= i4 or i4 is null)
            AND  (i1 >= i5 or i5 is null)
            THEN i1
            ELSE null
       END as itemno_1
      ,CASE WHEN (i2 >= i1 or i1 is null)
            AND  (i2 >= i3 or i3 is null)
            AND  (i2 >= i4 or i4 is null)
            AND  (i2 >= i5 or i5 is null)
            AND  (i1 <> i2 or i1 is null)
            THEN i2
            ELSE null
       END as itemno_2
      ,CASE WHEN (i3 >= i1 or i1 is null)
            AND  (i3 >= i2 or i2 is null)
            AND  (i3 >= i4 or i4 is null)
            AND  (i3 >= i5 or i5 is null)
            AND  (i1 <> i3 or i1 is null)
            AND  (i2 <> i3 or i2 is null)
            THEN i3
            ELSE null
       END as itemno_3
      ,CASE WHEN (i4 >= i1 or i1 is null)
            AND  (i4 >= i2 or i2 is null)
            AND  (i4 >= i3 or i3 is null)
            AND  (i4 >= i5 or i5 is null)
            AND  (i1 <> i4 or i1 is null)
            AND  (i2 <> i4 or i2 is null)
            and  (i3 <> i4 or i3 is null)
            THEN i4
            ELSE null
       END as itemno_4
      ,CASE WHEN (i5 >= i1 or  i1   is null)
            AND  (i5 >= i2 or  i2   is null)
            AND  (i5 >= i3 or  i3   is null)
            AND  (i5 >= i4 or  i4   is null)
            AND  (i1 is null or i1 <> i5)
            AND  (i2 is null or i2 <> i5)
            AND  (i3 is null or i3 <> i5)
            AND  (i4 is null or i4 <> i5)
            THEN i5
            ELSE null
       END as itemno_5

from (
select customer_id, name, location1
      ,max(itemno_1) as i1
      ,max(itemno_2) as i2
      ,max(itemno_3) as i3
      ,max(itemno_4) as i4
      ,max(itemno_5) as i5
from default.stack2
group by customer_id, name, location1) a
order by customer_id;

同样的事情也可以通过编写UDF而不是case语句来找到最多5列并按预期返回。