HiveQL - 将多级小计加入现有表

时间:2017-09-08 10:51:10

标签: sql hiveql

我的目标是确定各级组织的规模。我们假设我们有三个组织' A' B'和' C'每个组织由多个部门组成,并在团队中进一步细分与成员。,如下所述:

Org.    Dep.    Tm. Member
A       1       I   name1
A       1       I   name2
A       1       I   name3
A       1       II  name4
A       2       I   name5
A       2       I   name6
B       1       I   name7
B       1       II  name8
B       1       II  name9
B       1       II  name10
B       2       I   name11
B       2       I   name12
B       2       II  name13
B       2       II  name14
B       2       III name15
B       2       III name16
C       1       I   name17
C       1       I   name18
C       1       I   name19
C       1       I   name20
C       1       I   name21

现在,我想知道每个成员他们各自的组织有多大,Dep。和Tm。是这样的:

Org.    Dep.    Tm. Member  org dep tm
A       1       I   name1   6   4   3
A       1       I   name2   6   4   3
A       1       I   name3   6   4   3
A       1       II  name4   6   4   1
A       2       I   name5   6   2   2
A       2       I   name6   6   2   2
B       1       I   name7   10  4   1
B       1       II  name8   10  4   3
B       1       II  name9   10  4   3
B       1       II  name10  10  4   3
B       2       I   name11  10  6   2
B       2       I   name12  10  6   2
B       2       II  name13  10  6   2
B       2       II  name14  10  6   2
B       2       III name15  10  6   2
B       2       III name16  10  6   2
C       1       I   name17  5   5   5
C       1       I   name18  5   5   5
C       1       I   name19  5   5   5
C       1       I   name20  5   5   5
C       1       I   name21  5   5   5

我最初的想法是使用多个LEFT JOINS来聚合不同的级别,但由于您需要为每个聚合级别添加新的连接,因此扩展性非常差。有没有办法在一个声明中有效地做到这一点?

1 个答案:

答案 0 :(得分:2)

使用窗口功能:

select org, dep, tm,
       count(*) over (partition by org) as org_cnt,
       count(*) over (partition by org, dep) as dep_cnt,
       count(*) over (partition by org, dep, tm) as tm_cnt
from t;

这些列是分层的,因此deptm需要更高级别的层次结构。

编辑:

如果Hive不支持count(distinct)并且你需要它,那么你可以这样做:

select org, dep, tm,
       sum(case when seqnum_o = 1 then 1 else 0 end) over (partition by org) as org_cnt,
       sum(case when seqnum_od = 1 then 1 else 0 end) over (partition by org, dep) as dep_cnt,
       sum(case when seqnum_odt = 1 then 1 else 0 end) over (partition by org, dep, tm) as tm_cnt
from (select t.*,
             row_number() over partition by org, memberid order by org) as seqnum_o,
             row_number() over partition by org, dep, memberid order by org) as seqnum_od,
             row_number() over partition by org, dep, tm, memberid order by org) as seqnum_odt
      from t
     ) t;