我有大量来自不同足球比赛的csv文件。
数据看起来像下面的示例。
result
列可以包含3个可能的值:
H
->主队获胜(主场获得+ 3分)A
->客队获胜(客场将获得+ 3分)D
->平局(两支球队得+1分) HomeTeam AwayTeam Result
0 FC_Fake ABC_United H
1 Team_123 FC_Berlin A
2 FC_FAKE TEAM_123 D
我想更新文件,以便每行包含每个团队as they are at the start of the match
的积分总数(因此,尚未针对该行本身进行的比赛的结果进行更新)
我已使用以下代码更新数据框,因此该数据框包含每个团队的points_[TEAM_NAME]
虚拟列。
# Teams is a python list I extracted earlier
for team in teams:
df['points_' + team] = 0
目标是转换数据框,以使上面的示例变成下面的示例。
(同样,这些点应代表比赛开始时的情况。因此,即使FC_FAKE
在第一行赢得比赛,Points_FC_FAKE
列也为0)
HomeTeam | AwayTeam | Result Points_FC_FAKE | Points_TEAM_123 | Points_FC_Berlin | etc
-------------------------------------------------------------------------------
FC_Fake ABC_United H 0 0 0
Team_123 FC_Berlin A 3 0 0
FC_FAKE Team_123 D 3 0 3
我创建了以下python函数,如果该函数在数据框中的所有行上进行迭代,则应解析结果并将正确数量的点数奖励给合适的团队。
def point_updater(x):
if x['Result'] == 'H':
home = x['HomeTeam']
x.shift(-1)['points_' + home] += 3
return x
elif x['Result'] == 'A':
away = x['AwayTeam']
x.shift(-1)['points_' + away] += 3
return x
elif x['Result'] == 'D':
home = x['AwayTeam']
away = x['AwayTeam']
x.shift(-1)['points_' + home] += 1
x.shift(-1)['points_' + away] += 1
return x
问题是当我将此功能应用于数据框时,点不变(全部保持为0)
df = df.apply(point_counter, axis=1)
df['points_FC_Fake'].value_counts()
----
0 2691
有人知道我在做什么错吗?
答案 0 :(得分:1)
执行这些操作的方法可能更简洁,但这现在就足够了。您可以使用df.replace()
将键Result
映射到它们的关联值,然后使用pd.concat()
和pd.DataFrame.pivot()
获得所需的结果:
import pandas as pd
df = pd.DataFrame({'HomeTeam': ['FC_Fake','Team_123','FC_Fake'], 'AwayTeam': ['ABC_United','FC_Berlin','Team_123'], 'Result': ['H','A','D']})
remap = df.replace({'H': 3, 'A': 3, 'D': 1})
new = pd.concat([remap.pivot(columns='HomeTeam', values='Result'), remap.pivot(columns='AwayTeam', values='Result')], axis=1).shift(1).fillna(0).astype(int).cumsum()
final = pd.concat([df, new], axis=1)
收益:
HomeTeam AwayTeam Result FC_Fake Team_123 ABC_United FC_Berlin \
0 FC_Fake ABC_United H 0 0 0 0
1 Team_123 FC_Berlin A 3 0 3 0
2 FC_Fake Team_123 D 3 3 3 3
Team_123
0 0
1 0
2 0
答案 1 :(得分:1)
在某些例外情况下,我们可以使用iterrows
。另外,通过在开始计算之前进行一些清理,我使您的代码更具防故障能力和通用性:
# Convert to uppercase letters
df['HomeTeam'] = df['HomeTeam'].str.upper()
df['AwayTeam'] = df['AwayTeam'].str.upper()
# get a list off all the teams in competition
lst_teams = list(set(list(df.HomeTeam.unique()) + list(df.AwayTeam.unique())))
# Create columns for each team
for team in lst_teams:
df[team] = 0
# Iterrate over each row and assign correct points
for idx, r in df.iterrows():
if r['Result'] == 'H':
df.loc[[idx], [r['HomeTeam']]] = 3
if r['Result'] == 'A':
df.loc[[idx], [r['AwayTeam']]] = 3
if r['Result'] == 'D':
df.loc[[idx], [r['AwayTeam']]] = 1
df.loc[[idx], [r['HomeTeam']]] = 1
# Shift the rows one down, since points are only available at start of match
df.iloc[:, 3:] = df.iloc[:, 3:].cumsum().shift(1).fillna(0).astype(int)
输出
print(df)
HomeTeam AwayTeam Result ABC_UNITED TEAM_123 FC_FAKE FC_BERLIN
0 FC_FAKE ABC_UNITED H 0 0 0 0
1 TEAM_123 FC_BERLIN A 0 0 3 0
2 FC_FAKE TEAM_123 D 0 0 3 3
答案 2 :(得分:0)
将您的功能更改为此:
def point_updater(x):
if x['Result'] == 'H':
home = x['HomeTeam']
x['points_' + home] += 3
return x
elif x['Result'] == 'A':
away = x['AwayTeam']
x['points_' + away] += 3
return x
elif x['Result'] == 'D':
home = x['HomeTeam']
away = x['AwayTeam']
x['points_' + home] += 1
x['points_' + away] += 1
return x
然后将其添加到代码末尾:
df = df.apply(point_updater,axis=1)
for team in teams:
df["points_" + team]= df["points_" + team].cumsum()