我有一个用熊猫阅读的csv: 数据看起来像这样
home_team away_team home_score away_score
Scotland England 0 0
England Scotland 4 2
Scotland England 2 1
... ... ... ...
我想编写一个带有两个参数的函数-两个团队。 并且会输出第1队,第2队赢得比赛的次数以及那里的鬃毛抽奖游戏的次数
我尝试过比较得分,但不确定当同一个团队同时出现在主队和客队列中时,我将如何编码
def who_won(team1, team2):
home = data['home_team']
away = data['away_team']
home_score = data['home_score']
away_score = data['away_score']
counter_won = 0
counter_lost = 0
counter_draw = 0
for item in range(len(data['home_team'])):
if home_score > away_score:
home.append(counter_won)
counter_won = counter_won + 1
elif home_score < away_score:
home.append(counter_won)
counter_lost = counter_lost + 1
else:
counter_draw = counter_draw + 1
但是我不确定如何比较游戏并计算每次赢,输或平局的次数。
所需的输出为
England won 1 time versus Scotland
Scotland won 1 time versus England
Scotland and England had one draw
答案 0 :(得分:4)
您可以对数据进行一些预处理,然后使用pandas DataFrame的groupby
方法获取所需的输出
1)预处理
添加两列,其中一列包含我称为match
的(主场,客场)球队的元组,而另一列则显示比赛result
。
df['match'] = list(zip(df.home_team, df.away_team))
要获得匹配结果,您将需要一个函数:
def match_result(row):
if row.home_score > row.away_score:
return row.home_team + ' won'
elif row.home_score < row.away_score:
return row.away_team + ' won'
else:
return 'draw'
df['result'] = df.apply(match_result, axis=1)
2)分组依据
然后,您过滤数据集以仅包括输入主队和客队之间的比赛。最后,您将数据按结果分组并计算每个可能结果的数量:
df.loc[df.match.isin([(team1, team2), (team2, team1)]), 'result'].groupby(df.result).count()
测试
home_team away_team home_score away_score result \
0 Scotland England 0 0 draw
1 England Scotland 4 2 England won
2 Scotland England 2 1 Scotland won
match
0 (Scotland, England)
1 (England, Scotland)
2 (Scotland, England)
result
England won 1
Scotland won 1
draw 1
Name: result, dtype: int64
答案 1 :(得分:0)
实际上,away-home
的过滤器更容易实现:
df['won'] = np.sign(df['home_score']-df['away_score'])
df.groupby(['home_team','away_team'])['won'].value_counts()
输出:
home_team away_team won
England Scotland 1 1
Scotland England 0 1
1 1
Name: won, dtype: int64
就您而言,这有点棘手:
# home team won/lost/tied
df['won'] = np.sign(df['home_score']-df['away_score'])
# we don't care about home/away, so we sort the pair by name
# but we need to revert the result first:
df['won'] = np.where(df['home_team'].lt(df['away_team']),
df['won'], -df['won'])
# sort the pair home/away
df[['home_team','away_team']] = np.sort(df[['home_team','away_team']], axis=1)
# value counts:
df.groupby(['home_team','away_team'])['won'].value_counts()
输出:
home_team away_team won
England Scotland -1 1
0 1
1 1
Name: won, dtype: int64
答案 2 :(得分:0)
我的解决方案考虑了以下细节:
要获得结果,请按如下所示定义函数:
def who_won(team1, team2):
df1 = df.query('home_team == @team1 and away_team == @team2')\
.set_axis(['tm1', 'tm2', 's1', 's2'], axis=1, inplace=False)
df2 = df.query('home_team == @team2 and away_team == @team1')\
.set_axis(['tm2', 'tm1', 's2', 's1'], axis=1, inplace=False)
df3 = pd.concat([df1, df2], sort=False).reset_index(drop=True)
dif = df3.s1 - df3.s2
bins = pd.cut(dif, bins=[-100, -1, 0, 100], labels=['lost', 'draw', 'won'])
return dif.groupby(bins).count()
请注意一个巧妙的技巧,当 team2 出现时,我如何“交换”主队和客队 home 小组( df2 )。 然后,我将 df1 和 df2 串联起来,这样 team1 总是在 tm1 中 柱。 所以现在 df3.s1-df3.s2 是 team1 的目标与目标之间的差异 (请注意,其他解决方案无法识别这种差异)。
然后,调用 cut 引入适当的类别名称(丢失 / draw / won ),从而可以直观地访问最终结果的各个组成部分。
为了测试此功能,我使用了更大的DataFrame,包括其他团队:
home_team away_team home_score away_score
0 Scotland England 0 0
1 England Scotland 4 2
2 England Scotland 3 1
3 Scotland England 2 1
4 Scotland Wales 3 1
5 Wales Scotland 2 1
然后我打电话给who_won('England', 'Scotland')
得到结果:
lost 1
draw 1
won 2
dtype: int64
结果是具有 CategoricalIndex (丢失 / draw / 获胜系列 >)。
如果您想将此结果重新格式化为所需的输出,
并获得每个“组件”,这很容易。
例如。获得英格兰与苏格兰获胜时的比赛次数,
运行res['won']
。