我将计算通过group by room_id
计算其他数据的所有数据的百分位数,如下所示:
select
distinct room_id,
count(user_id) over (partition by room_id) as user_cnt,
sum(price) over (partition by room_id) as price,
percentile(cast(price as bigint),0.5) over () as price_median
from
ods.ods_trade
where day = '2017-08-08' and trade_status = 1
以上代码可以在SparkSQL
中正确运行,但在hive
中说明:
At least 1 group must only depend on input columns ... Expression not in GROUP BY key 'price'
percentile() over()
也会返回1个值,那么为什么会出现此问题以及如何解决?任何帮助将不胜感激..
例如为: 数据是:
room user price(consume)
a u1 1
a u1 5
a u2 3
b u1 2
b u3 4
c u4 6
c u4 7
预期结果:
room_id user_cnt price price_median
a 2 8 4
b 2 6 4
c 1 13 4
答案 0 :(得分:0)
错误表示价格不在分组中。以下查询应该有效:
select room, count(distinct user_id) , sum(price),
price_median from (
SELECT room, user_id, price,
percentile(cast(price as bigint),0.5) OVER () as price_median
FROM ods.ods_trade
GROUP BY room, USER_id, price
)k1
group by room, price_median
注意:列名可能略有不同。