Hive查询:根据不同列的中位数选择分区上的列

时间:2016-12-27 13:47:54

标签: sql hadoop hive amazon-emr

我需要帮助建模查询,因为我无法做到。

我的数据是:

id   name   school   height
1    A      S1       10
2    B      S1       12
3    C      S1       14
4    D      S2       15
5    E      S2       16
6    F      S2       17

我想选择每个学校的中位数高度的名称和名称。

预期产出:

id   name  school  myval
1    A    S1    B
2    B    S1    B
3    C    S1    B
4    D    S2    E
5    E    S2    E
6    F    S2    E

此处,B人在学校S1的中位数高度,而在S2中则为E.

我知道我们可以使用百分位数获得中位数。但我无法弄清楚如何选择每个分区的值。

2 个答案:

答案 0 :(得分:1)

以下查询将起作用: -

select 
  temp1.id,
  temp1.name,
  temp1.school,
  temp2.name 
from 
  (select 
     id,
     name,
     school,
     height 
  from 
     TABLE_NAME
  ) temp1
  left Join        
   (select 
      school,
      name 
    from 
      (select 
        id,
        name,
        school,
        height,
        SUM(height) OVER 
           (PARTITION BY school)/COUNT(height) OVER 
               (PARTITION BY school) as avg 
      from 
        TABLE_NAME) AVERG 
   where height=avg ) temp2 on temp1.school=temp2.school ;

答案 1 :(得分:0)

这给出了中间列

select a.id,a.name,a.school,a.height, b.median from your_table a join (select school, CAST(percentile(CAST(height as BIGINT),0.5) as INT) as median from your_table group by school) b on a.school = b.school;