如何将value_counts()输出转换为数据帧?

时间:2017-06-29 07:43:24

标签: python pandas dataframe

我有一个这样的数据框:

match       team1       team2       winner
1            MI           KKR        MI
2            DD           CSK        DD
3            RCB          DC         RCB.....

我想要计算的是一支球队在锦标赛中多少次与另一支球队比赛。喜欢 MI vs KKR:

MI:10

KKR:5

所以我写了这样一个函数:

def comparator(team1):
    mt1=matches[((matches['team1']==team1)|(matches['team2']==team1))]
    teams=['MI','KKR','RCB','DC','CSK','RR','DD','GL','KXIP','SRH','RPS','KTK','PW']
    teams.remove(team1)
    opponents=teams.copy()
    for i in opponents:
        mt2=mt1[(((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))].winner.value_counts()
        print(mt2)
comparator('MI')

现在,在函数中, mt2 会打印出team1和team2各自胜利的正确值。输出是这样的:

MI     13
KKR     5
Name: winner, dtype: int64
MI     11
RCB     8
Name: winner, dtype: int64

现在输出正确但格式不合适。我想将以下输出转换为数据帧。

我尝试将这些值附加到列表中,但它不起作为行名称:获胜者,dtype:int64 也会附加到列表中。

如何将其转换为数据框?

2 个答案:

答案 0 :(得分:1)

我认为你需要:

如果需要索引作为列添加Series.reset_index

mask = (((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))
mt2 = mt1.loc[mask, 'winner'].value_counts().reset_index()

如果需要Series转换为一列DataFrame,请添加Series.to_frame

mask = (((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))
mt2 = mt1.loc[mask, 'winner'].value_counts().to_frame()

同样最好使用loc boolean mask并定义列。

答案 1 :(得分:1)

您可以稍微简化搜索,或者使其更具可读性

def my_comp(df, team):
    matches_with_team = df[(df[['team1', 'team2']] == team).any(axis=1)]
    other_teams = (set(matches_with_team['team1']) ^ set(matches_with_team['team2'])) - {team}
    comparison_df = pd.DataFrame(index=other_teams, columns=['wins', 'losses'])
    comparison_df.index.name = 'opponent'
    for opponent in other_teams:
        matches_against_opponents = matches_with_team[(matches_with_team[['team1', 'team2']] == opponent).any(axis=1)]
        winners = matches_against_opponents['winner'].value_counts().reindex([team, opponent])
        # print(winners)
        comparison_df.loc[opponent] = [winners[team], winners[opponent]]
    return comparison_df.fillna(0).astype(int)
  

my_comp(df, 'MI')

    wins    losses
opponent        
KKR     1.0     0

现在你可以制作一个巨人DataFrame来覆盖所有结果

all_teams = sorted(set(df['team1']) ^ set(df['team2']))
  

all_teams

['CSK', 'DC', 'DD', 'KKR', 'MI', 'RCB']

使用此输入运行时:

    team1   team2   winner
match           
1   MI      KKR     MI
2   DD      CSK     DD
3   RCB     DC      RCB
4   RCB     CSK     RCB
  

pd.concat((my_comp(df, team) for team in teams), keys=teams).groupby(level=[0, 1]).sum()

                wins    losses
    opponent        
CSK     DD      0       1
        RCB     0       1
DC      RCB     0       1
DD      CSK     1       0
KKR     MI      0       1
MI      KKR     1       0
RCB     CSK     1       0
        DC      1       0