pandas - 将不同的groupby函数应用于数据帧

时间:2016-09-01 23:54:14

标签: pandas

我从SQL背景来到pandas,我将使用PARTITION BY函数将两列聚合到不同的级别。

这是数据框

 TeamName   PlayerID    PlayerLevel
 A          1           Beginner
 A          2           Beginner
 A          3           Intermediate
 A          4           Intermediate
 A          5           Intermediate
 A          6           Advanced
 B          7           Beginner
 B          8           Beginner
 B          9           Advanced
 B          10          Intermediate
 B          11          Beginner
 B          12          Advanced

我想计算落入每个玩家等级的玩家,我可以轻松使用

             .groupby(['TeamName', 'PlayerLevel'], as_index=False) \
        .agg({'PlayerID': 'count'})

这让我知道了这个

 TeamName   PlayerLevel     PlayerID
 A          Beginner        2
 A          Intermediate    3
 A          Advanced        1
 B          Beginner        3
 B          Intermediate    1
 B          Advanced        2

但我也想要的是"分母",所以每支球队的球员总数。示例数据框(重命名列)(在此示例中,两个团队的分母恰好为6)。

 TeamName   PlayerLevel     Numerator  Denominator
 A          Beginner        2          6
 A          Intermediate    3          6
 A          Advanced        1          6
 B          Beginner        3          6
 B          Intermediate    1          6
 B          Advanced        2          6

但我无法弄清楚如何让多个群体聚合在一起很好地发挥作用。

4 个答案:

答案 0 :(得分:2)

Per @ root的建议和@ Jeff的评论的动机 即使它看起来略有不同,这也恰好相当于@ MaxU的答案。

df1 = df.groupby(['TeamName', 'PlayerLevel']).size().to_frame('Numerator')
df1['Denominator'] = df1.groupby(level='TeamName').transform(sum)

df1

enter image description here

旧答案

numerator = df.groupby(['TeamName', 'PlayerLevel']).size().rename('numerator')
numerator

TeamName  PlayerLevel 
A         Advanced        1
          Beginner        2
          Intermediate    3
B         Advanced        2
          Beginner        3
          Intermediate    1
Name: numerator, dtype: int64
denominator = df.groupby(['TeamName']).size().rename('denominator')
denominator

TeamName
A    6
B    6
Name: denominator, dtype: int64
numerator.to_frame().merge(denominator.to_frame(),
                           right_index=True, left_index=True)

enter image description here

df.groupby(['TeamName', 'PlayerLevel']).size().unstack() \
    .div(df.groupby(['TeamName']).size(), axis=0)

enter image description here

答案 1 :(得分:1)

这不是OP想要的正确形式,但问题感觉就像交叉表。 也许有人可以改写OP想要的形式。

dfx = pd.crosstab(df['TeamName'], df['PlayerLevel'], margins =True)
dfx = dfx.drop("All")
dfx

PlayerLevel   Advanced  Beginner  Intermediate  All
TeamName                                          
A                   1         2             3    6
B                   2         3             1    6

答案 2 :(得分:1)

首先创建您的numerator

df = df.groupby(['TeamName','PlayerLevel'], as_index=False).count()
df = df.rename(columns={'PlayerID':'numerator'})

  TeamName   PlayerLevel  numerator
0        A      Advanced          1
1        A      Beginner          2
2        A  Intermediate          3
3        B      Advanced          2
4        B      Beginner          3
5        B  Intermediate          1

然后使用transform将计数总和与'TeamName'相加一级。使用transform因为它自然地将聚合结果广播到每个组的索引,因此允许您分配结果:

df['denominator'] = df.groupby('TeamName')['numerator'].transform(sum)

  TeamName   PlayerLevel  numerator  denominator
0        A      Advanced          1            6
1        A      Beginner          2            6
2        A  Intermediate          3            6
3        B      Advanced          2            6
4        B      Beginner          3            6
5        B  Intermediate          1            6

答案 3 :(得分:1)

这个怎么样?

def protect_against_forgery?
  false
end