python中的熊猫聚合

时间:2021-02-04 05:58:18

标签: python pandas

我有以下数据集。我想创建一个包含所有球队的数据框,其中包括 2017 年的比赛场数、胜利、失败和平局以及平均分差(Y = 17)。


Date        Y   HomeTeam    AwayTeam       HomePoints    AwayPoints     
2014-08-16  14  Arsenal     Crystal Palace 2             1                  
2014-08-16  14  Leicester   Everton        2             2          
2014-08-16  14  Man United  Swansea        1             2          
2014-08-16  14  QPR         Hull           0             1          
2014-08-16  14  Stoke       Aston Villa    0             1          

我写了以下代码:

df17 = df[df['Y'] == 17]
df17['differential'] = abs(df['HomePoints'] - df['AwayPoints'])
df17['home_wins'] = np.where(df17['HomePoints'] > df17['AwayPoints'], 1, 0)
df17['home_losses'] = np.where(df17['HomePoints'] < df17['AwayPoints'], 1, 0)
df17['home_ties'] = np.where(df17['HomePoints'] == df17['AwayPoints'], 1, 0)
df17['game_count'] = 1
df17.groupby("HomeTeam").agg({"differential": np.mean, "home_wins": np.sum, "home_losses": np.sum, "home_ties": np.sum, "game_count": np.sum}).sort_values(["differential"], ascending = False)

但我不认为这是正确的,因为我只考虑主队..有人有干净的方法吗?

1 个答案:

答案 0 :(得分:0)

融化数据框,每条旧行允许我们两条新行,这使我们可以有一行用于 HomeTeam 和一行用于 AwayTeam

请在此处找到 melt 方法的文档:https://pandas.pydata.org/docs/reference/api/pandas.melt.html

df = pd.melt(df, id_vars=['Date', 'Y', 'HomePoints', 'AwayPoints'], value_vars=['HomeTeam', 'AwayTeam'])
df = df.rename({'value': 'Team', 'variable': 'Home/Away'}, axis=1)
df['Differential'] = df['Home/Away'].replace({'HomeTeam': 1, 'AwayTeam': -1}) * (df['HomePoints'] - df['AwayPoints'])

def count_wins(x):
    return (x > 0).sum()

def count_losses(x):
    return (x < 0).sum()

def count_draws(x):
    return (x == 0).sum()

df = df.groupby('Team')['Differential'].agg(['count', count_wins, count_losses, count_draws, 'sum'])
df = df.rename({'count': 'Number of games', 'count_wins': 'Wins', 'count_losses': 'Losses', 'count_draws': 'Draws', 'sum': 'Differential'}, axis=1)