我的目标是确定各级组织的规模。我们假设我们有三个组织' A' B'和' C'每个组织由多个部门组成,并在团队中进一步细分与成员。,如下所述:
Org. Dep. Tm. Member
A 1 I name1
A 1 I name2
A 1 I name3
A 1 II name4
A 2 I name5
A 2 I name6
B 1 I name7
B 1 II name8
B 1 II name9
B 1 II name10
B 2 I name11
B 2 I name12
B 2 II name13
B 2 II name14
B 2 III name15
B 2 III name16
C 1 I name17
C 1 I name18
C 1 I name19
C 1 I name20
C 1 I name21
现在,我想知道每个成员他们各自的组织有多大,Dep。和Tm。是这样的:
Org. Dep. Tm. Member org dep tm
A 1 I name1 6 4 3
A 1 I name2 6 4 3
A 1 I name3 6 4 3
A 1 II name4 6 4 1
A 2 I name5 6 2 2
A 2 I name6 6 2 2
B 1 I name7 10 4 1
B 1 II name8 10 4 3
B 1 II name9 10 4 3
B 1 II name10 10 4 3
B 2 I name11 10 6 2
B 2 I name12 10 6 2
B 2 II name13 10 6 2
B 2 II name14 10 6 2
B 2 III name15 10 6 2
B 2 III name16 10 6 2
C 1 I name17 5 5 5
C 1 I name18 5 5 5
C 1 I name19 5 5 5
C 1 I name20 5 5 5
C 1 I name21 5 5 5
我最初的想法是使用多个LEFT JOINS来聚合不同的级别,但由于您需要为每个聚合级别添加新的连接,因此扩展性非常差。有没有办法在一个声明中有效地做到这一点?
答案 0 :(得分:2)
使用窗口功能:
select org, dep, tm,
count(*) over (partition by org) as org_cnt,
count(*) over (partition by org, dep) as dep_cnt,
count(*) over (partition by org, dep, tm) as tm_cnt
from t;
这些列是分层的,因此dep
和tm
需要更高级别的层次结构。
编辑:
如果Hive不支持count(distinct)
并且你需要它,那么你可以这样做:
select org, dep, tm,
sum(case when seqnum_o = 1 then 1 else 0 end) over (partition by org) as org_cnt,
sum(case when seqnum_od = 1 then 1 else 0 end) over (partition by org, dep) as dep_cnt,
sum(case when seqnum_odt = 1 then 1 else 0 end) over (partition by org, dep, tm) as tm_cnt
from (select t.*,
row_number() over partition by org, memberid order by org) as seqnum_o,
row_number() over partition by org, dep, memberid order by org) as seqnum_od,
row_number() over partition by org, dep, tm, memberid order by org) as seqnum_odt
from t
) t;