Question

我想通过在熊猫中使用bygroup来计算每个区域的得分率，但不确定如何做到：

假设df有两列：

Shot_type   Shot_zone
   Goal     Penalty_area
   Saved    Penalty_area
   Goal     Goal Box
   Saved    Goal Box

在这里，我想按Shot_zone分组，并根据Shot_type的目标计数/每种Shot_zone类型的len（）计算得分率。这里每个Shot_zone都有1个进球和1个保存，因此结果应类似于：

Penalty_area   50%
Goal Box       50%

有没有使用Pandas的方法可以理解？非常感谢你！

Answer 1

使用

pd.crosstab(df.Shot_type,df.Shot_zone,normalize='index')
Out[662]: 
Shot_zone  GoalBox  Penalty_area
Shot_type                       
Goal           0.5           0.5
Saved          0.5           0.5

Answer 2

一种方法是对您的Shot_type列进行二值化，即如果它等于True则设置为'Goal'，然后使用GroupBy + mean：

res = df.assign(Shot_type=df['Shot_type']=='Goal')\
        .groupby('Shot_zone')['Shot_type'].mean()

print(res)

Shot_zone
GoalBox         0.5
Penalty_area    0.5
Name: Shot_type, dtype: float64

Answer 3

也可以groupby和apply

df.groupby('Shot_zone').Shot_type.apply(lambda s: '{}%'.format((s[s=='Goal']).size/(s.size) * 100))

Shot_zone
Goal_Box        50.0%
Penalty_area    50.0%

Answer 4

您可以执行以下操作：

data[data['Shot_type']=='Goal'].groupby(['Shot_zone'])['Shot_zone'].count()
/data.groupby(['Shot_zone'])['Shot_zone'].count())

熊猫如何根据每组的长度和另一列的计数值计算按组结果

4 个答案: