Question

我有一个用熊猫阅读的csv：数据看起来像这样

home_team    away_team    home_score    away_score
Scotland     England      0             0
England      Scotland     4             2
Scotland     England      2             1
...          ...          ...           ...

我想编写一个带有两个参数的函数-两个团队。并且会输出第1队，第2队赢得比赛的次数以及那里的鬃毛抽奖游戏的次数

我尝试过比较得分，但不确定当同一个团队同时出现在主队和客队列中时，我将如何编码

def who_won(team1, team2):

    home = data['home_team']
    away = data['away_team']
    home_score = data['home_score']
    away_score = data['away_score']
    counter_won = 0
    counter_lost = 0
    counter_draw = 0
    for item in range(len(data['home_team'])):

        if home_score > away_score:
            home.append(counter_won)
            counter_won = counter_won + 1
        elif home_score < away_score:
            home.append(counter_won)
            counter_lost = counter_lost + 1
        else:
            counter_draw = counter_draw + 1

但是我不确定如何比较游戏并计算每次赢，输或平局的次数。

所需的输出为

England won 1 time versus Scotland
Scotland won 1 time versus England
Scotland and England had one draw

Answer 1

您可以对数据进行一些预处理，然后使用pandas DataFrame的groupby方法获取所需的输出

1）预处理

添加两列，其中一列包含我称为match的（主场，客场）球队的元组，而另一列则显示比赛result。

df['match'] = list(zip(df.home_team, df.away_team))

要获得匹配结果，您将需要一个函数：

def match_result(row):
    if row.home_score > row.away_score:
        return row.home_team + ' won'
    elif row.home_score < row.away_score:
        return row.away_team + ' won'
    else:
        return 'draw'

df['result'] = df.apply(match_result, axis=1)

2）分组依据

然后，您过滤数据集以仅包括输入主队和客队之间的比赛。最后，您将数据按结果分组并计算每个可能结果的数量：

df.loc[df.match.isin([(team1, team2), (team2, team1)]), 'result'].groupby(df.result).count()

测试

  home_team away_team  home_score  away_score        result  \
0  Scotland   England           0           0          draw   
1   England  Scotland           4           2   England won   
2  Scotland   England           2           1  Scotland won   

                 match  
0  (Scotland, England)  
1  (England, Scotland)  
2  (Scotland, England)

result
England won     1
Scotland won    1
draw            1
Name: result, dtype: int64

Answer 2

实际上，away-home的过滤器更容易实现：

df['won'] = np.sign(df['home_score']-df['away_score'])
df.groupby(['home_team','away_team'])['won'].value_counts()

输出：

home_team  away_team  won
England    Scotland   1      1
Scotland   England    0      1
                      1      1
Name: won, dtype: int64

就您而言，这有点棘手：

# home team won/lost/tied
df['won'] = np.sign(df['home_score']-df['away_score'])

# we don't care about home/away, so we sort the pair by name
# but we need to revert the result first:
df['won'] = np.where(df['home_team'].lt(df['away_team']),
                     df['won'], -df['won'])

# sort the pair home/away
df[['home_team','away_team']] = np.sort(df[['home_team','away_team']], axis=1)

# value counts:
df.groupby(['home_team','away_team'])['won'].value_counts()

输出：

home_team  away_team  won
England    Scotland   -1     1
                       0     1
                       1     1
Name: won, dtype: int64

Answer 3

我的解决方案考虑了以下细节：

两个团队（ team1 和 team2 ）可以是主场或离开，但是您想知道 team1 赢得/丢失/与 team2 并列的次数。
源DataFrame还包含与其他团队的比赛或 home 和 away 团队都是“其他”团队（与我们的2个团队不同感兴趣）。

要获得结果，请按如下所示定义函数：

def who_won(team1, team2):
    df1 = df.query('home_team == @team1 and away_team == @team2')\
        .set_axis(['tm1', 'tm2', 's1', 's2'], axis=1, inplace=False)
    df2 = df.query('home_team == @team2 and away_team == @team1')\
        .set_axis(['tm2', 'tm1', 's2', 's1'], axis=1, inplace=False)
    df3 = pd.concat([df1, df2], sort=False).reset_index(drop=True)
    dif = df3.s1 - df3.s2
    bins = pd.cut(dif, bins=[-100, -1, 0, 100], labels=['lost', 'draw', 'won'])
    return dif.groupby(bins).count()

请注意一个巧妙的技巧，当 team2 出现时，我如何“交换”主队和客队 home 小组（ df2 ）。然后，我将 df1 和 df2 串联起来，这样 team1 总是在 tm1 中柱。所以现在 df3.s1-df3.s2 是 team1 的目标与目标之间的差异（请注意，其他解决方案无法识别这种差异）。

然后，调用 cut 引入适当的类别名称（丢失 / draw / won ），从而可以直观地访问最终结果的各个组成部分。

为了测试此功能，我使用了更大的DataFrame，包括其他团队：

  home_team away_team  home_score  away_score
0  Scotland   England           0           0
1   England  Scotland           4           2
2   England  Scotland           3           1
3  Scotland   England           2           1
4  Scotland     Wales           3           1
5     Wales  Scotland           2           1

然后我打电话给who_won('England', 'Scotland')得到结果：

lost    1
draw    1
won     2
dtype: int64

结果是具有 CategoricalIndex （丢失 / draw / 获胜系列 >）。

如果您想将此结果重新格式化为所需的输出，并获得每个“组件”，这很容易。例如。获得英格兰与苏格兰获胜时的比赛次数，运行res['won']。

如何检查值A比值B大多少倍

3 个答案: