我从SQL背景来到pandas,我将使用PARTITION BY函数将两列聚合到不同的级别。
这是数据框
TeamName PlayerID PlayerLevel
A 1 Beginner
A 2 Beginner
A 3 Intermediate
A 4 Intermediate
A 5 Intermediate
A 6 Advanced
B 7 Beginner
B 8 Beginner
B 9 Advanced
B 10 Intermediate
B 11 Beginner
B 12 Advanced
我想计算落入每个玩家等级的玩家,我可以轻松使用
.groupby(['TeamName', 'PlayerLevel'], as_index=False) \
.agg({'PlayerID': 'count'})
这让我知道了这个
TeamName PlayerLevel PlayerID
A Beginner 2
A Intermediate 3
A Advanced 1
B Beginner 3
B Intermediate 1
B Advanced 2
但我也想要的是"分母",所以每支球队的球员总数。示例数据框(重命名列)(在此示例中,两个团队的分母恰好为6)。
TeamName PlayerLevel Numerator Denominator
A Beginner 2 6
A Intermediate 3 6
A Advanced 1 6
B Beginner 3 6
B Intermediate 1 6
B Advanced 2 6
但我无法弄清楚如何让多个群体聚合在一起很好地发挥作用。
答案 0 :(得分:2)
Per @ root的建议和@ Jeff的评论的动机 即使它看起来略有不同,这也恰好相当于@ MaxU的答案。
df1 = df.groupby(['TeamName', 'PlayerLevel']).size().to_frame('Numerator')
df1['Denominator'] = df1.groupby(level='TeamName').transform(sum)
df1
numerator = df.groupby(['TeamName', 'PlayerLevel']).size().rename('numerator')
numerator
TeamName PlayerLevel
A Advanced 1
Beginner 2
Intermediate 3
B Advanced 2
Beginner 3
Intermediate 1
Name: numerator, dtype: int64
denominator = df.groupby(['TeamName']).size().rename('denominator')
denominator
TeamName
A 6
B 6
Name: denominator, dtype: int64
numerator.to_frame().merge(denominator.to_frame(),
right_index=True, left_index=True)
df.groupby(['TeamName', 'PlayerLevel']).size().unstack() \
.div(df.groupby(['TeamName']).size(), axis=0)
答案 1 :(得分:1)
这不是OP想要的正确形式,但问题感觉就像交叉表。 也许有人可以改写OP想要的形式。
dfx = pd.crosstab(df['TeamName'], df['PlayerLevel'], margins =True)
dfx = dfx.drop("All")
dfx
PlayerLevel Advanced Beginner Intermediate All
TeamName
A 1 2 3 6
B 2 3 1 6
答案 2 :(得分:1)
首先创建您的numerator
:
df = df.groupby(['TeamName','PlayerLevel'], as_index=False).count()
df = df.rename(columns={'PlayerID':'numerator'})
TeamName PlayerLevel numerator
0 A Advanced 1
1 A Beginner 2
2 A Intermediate 3
3 B Advanced 2
4 B Beginner 3
5 B Intermediate 1
然后使用transform
将计数总和与'TeamName'
相加一级。使用transform
因为它自然地将聚合结果广播到每个组的索引,因此允许您分配结果:
df['denominator'] = df.groupby('TeamName')['numerator'].transform(sum)
TeamName PlayerLevel numerator denominator
0 A Advanced 1 6
1 A Beginner 2 6
2 A Intermediate 3 6
3 B Advanced 2 6
4 B Beginner 3 6
5 B Intermediate 1 6
答案 3 :(得分:1)
这个怎么样?
def protect_against_forgery?
false
end