我有一个这样的数据框:
match team1 team2 winner
1 MI KKR MI
2 DD CSK DD
3 RCB DC RCB.....
我想要计算的是一支球队在锦标赛中多少次与另一支球队比赛。喜欢 MI vs KKR:
MI:10
KKR:5
所以我写了这样一个函数:
def comparator(team1):
mt1=matches[((matches['team1']==team1)|(matches['team2']==team1))]
teams=['MI','KKR','RCB','DC','CSK','RR','DD','GL','KXIP','SRH','RPS','KTK','PW']
teams.remove(team1)
opponents=teams.copy()
for i in opponents:
mt2=mt1[(((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))].winner.value_counts()
print(mt2)
comparator('MI')
现在,在函数中, mt2 会打印出team1和team2各自胜利的正确值。输出是这样的:
MI 13
KKR 5
Name: winner, dtype: int64
MI 11
RCB 8
Name: winner, dtype: int64
现在输出正确但格式不合适。我想将以下输出转换为数据帧。
我尝试将这些值附加到列表中,但它不起作为行名称:获胜者,dtype:int64 也会附加到列表中。
如何将其转换为数据框?
答案 0 :(得分:1)
我认为你需要:
如果需要索引作为列添加Series.reset_index
:
mask = (((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))
mt2 = mt1.loc[mask, 'winner'].value_counts().reset_index()
如果需要Series
转换为一列DataFrame
,请添加Series.to_frame
:
mask = (((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))
mt2 = mt1.loc[mask, 'winner'].value_counts().to_frame()
同样最好使用loc
boolean mask
并定义列。
答案 1 :(得分:1)
您可以稍微简化搜索,或者使其更具可读性
def my_comp(df, team):
matches_with_team = df[(df[['team1', 'team2']] == team).any(axis=1)]
other_teams = (set(matches_with_team['team1']) ^ set(matches_with_team['team2'])) - {team}
comparison_df = pd.DataFrame(index=other_teams, columns=['wins', 'losses'])
comparison_df.index.name = 'opponent'
for opponent in other_teams:
matches_against_opponents = matches_with_team[(matches_with_team[['team1', 'team2']] == opponent).any(axis=1)]
winners = matches_against_opponents['winner'].value_counts().reindex([team, opponent])
# print(winners)
comparison_df.loc[opponent] = [winners[team], winners[opponent]]
return comparison_df.fillna(0).astype(int)
my_comp(df, 'MI')
wins losses
opponent
KKR 1.0 0
现在你可以制作一个巨人DataFrame
来覆盖所有结果
all_teams = sorted(set(df['team1']) ^ set(df['team2']))
all_teams
['CSK', 'DC', 'DD', 'KKR', 'MI', 'RCB']
使用此输入运行时:
team1 team2 winner
match
1 MI KKR MI
2 DD CSK DD
3 RCB DC RCB
4 RCB CSK RCB
pd.concat((my_comp(df, team) for team in teams), keys=teams).groupby(level=[0, 1]).sum()
wins losses
opponent
CSK DD 0 1
RCB 0 1
DC RCB 0 1
DD CSK 1 0
KKR MI 0 1
MI KKR 1 0
RCB CSK 1 0
DC 1 0