我有以下数据集。我想创建一个包含所有球队的数据框,其中包括 2017 年的比赛场数、胜利、失败和平局以及平均分差(Y = 17)。
Date Y HomeTeam AwayTeam HomePoints AwayPoints
2014-08-16 14 Arsenal Crystal Palace 2 1
2014-08-16 14 Leicester Everton 2 2
2014-08-16 14 Man United Swansea 1 2
2014-08-16 14 QPR Hull 0 1
2014-08-16 14 Stoke Aston Villa 0 1
我写了以下代码:
df17 = df[df['Y'] == 17]
df17['differential'] = abs(df['HomePoints'] - df['AwayPoints'])
df17['home_wins'] = np.where(df17['HomePoints'] > df17['AwayPoints'], 1, 0)
df17['home_losses'] = np.where(df17['HomePoints'] < df17['AwayPoints'], 1, 0)
df17['home_ties'] = np.where(df17['HomePoints'] == df17['AwayPoints'], 1, 0)
df17['game_count'] = 1
df17.groupby("HomeTeam").agg({"differential": np.mean, "home_wins": np.sum, "home_losses": np.sum, "home_ties": np.sum, "game_count": np.sum}).sort_values(["differential"], ascending = False)
但我不认为这是正确的,因为我只考虑主队..有人有干净的方法吗?
答案 0 :(得分:0)
融化数据框,每条旧行允许我们两条新行,这使我们可以有一行用于 HomeTeam
和一行用于 AwayTeam
。
请在此处找到 melt
方法的文档:https://pandas.pydata.org/docs/reference/api/pandas.melt.html
df = pd.melt(df, id_vars=['Date', 'Y', 'HomePoints', 'AwayPoints'], value_vars=['HomeTeam', 'AwayTeam'])
df = df.rename({'value': 'Team', 'variable': 'Home/Away'}, axis=1)
df['Differential'] = df['Home/Away'].replace({'HomeTeam': 1, 'AwayTeam': -1}) * (df['HomePoints'] - df['AwayPoints'])
def count_wins(x):
return (x > 0).sum()
def count_losses(x):
return (x < 0).sum()
def count_draws(x):
return (x == 0).sum()
df = df.groupby('Team')['Differential'].agg(['count', count_wins, count_losses, count_draws, 'sum'])
df = df.rename({'count': 'Number of games', 'count_wins': 'Wins', 'count_losses': 'Losses', 'count_draws': 'Draws', 'sum': 'Differential'}, axis=1)