Hive percentile()over()需要group by

时间:2017-08-17 14:37:10

标签: sql hadoop hive

我将计算通过group by room_id计算其他数据的所有数据的百分位数,如下所示:

   select 
        distinct room_id,
        count(user_id) over (partition by room_id) as user_cnt,
        sum(price) over (partition by room_id) as price,
        percentile(cast(price as bigint),0.5) over () as price_median 
    from
        ods.ods_trade
    where day = '2017-08-08' and trade_status = 1 

以上代码可以在SparkSQL中正确运行,但在hive中说明:

At least 1 group must only depend on input columns ... Expression not in GROUP BY key 'price'

percentile() over()也会返回1个值,那么为什么会出现此问题以及如何解决?任何帮助将不胜感激..

例如为: 数据是:

room  user price(consume)
  a    u1    1
  a    u1    5
  a    u2    3
  b    u1    2
  b    u3    4
  c    u4    6
  c    u4    7

预期结果:

  room_id  user_cnt   price  price_median
    a        2         8         4
    b        2         6         4
    c        1         13        4

1 个答案:

答案 0 :(得分:0)

错误表示价格不在分组中。以下查询应该有效:

select room, count(distinct user_id) , sum(price),
price_median from (
SELECT room, user_id, price, 
percentile(cast(price as bigint),0.5) OVER () as price_median
FROM ods.ods_trade
GROUP BY room, USER_id, price
  )k1
 group by room, price_median

注意:列名可能略有不同。