我有一个这样的数据框:
team1 team2 winner
KKR RCB KKR
CSK KXIP CSK
RR DD DD
MI KKR KKR
DC KKR KKR
KXIP RR RR
DC DD DD
MI KKR KKR....
现在我想要检查的是一支球队在锦标赛中与另一支球队比赛的胜利次数。例如:MI对KKR赢了2次。所以输出应该像 MI vs KKR = MI:2 KKR:0
我可以通过一次带2支球队手动完成,但这需要更长的时间。有人可以帮我这个吗?
答案 0 :(得分:0)
如果团队订单在整个数据集中不一致,则需要定义match
列:
df['match'] = df[['team1', 'team2']].apply(
lambda row: tuple(sorted(row.values)),
axis=1
)
元组是必需的,因为它是可以清除的。
目前还不清楚你想要什么输出,但这应该让你接近你的结果:
df.groupby('match')['winner'].value_counts()
输出:
match winner
(CSK, KXIP) CSK 1
(DC, DD) DD 1
...
答案 1 :(得分:0)
假设团队的顺序始终相同:
df.groupby(['team1','team2']).apply(lambda x: str(sum(x.winner == x.team1))+':'+str(sum(x.winner == x.team2)))
没有假设,这将是解决方案 - 使用df创建:
import pandas as pd
df = pd.DataFrame({'team1': ['KKR','CSK','RR','MI','DC','KXIP','DC','MI','KKR'],
'team2': ['RCB','KXIP','DD','KKR','KKR','RR','DD','KKR','MI'],
'winner': ['KKR','CSK','DD','KKR','KKR','RR','DD','KKR','MI']})
teamSort = [sorted(item) for item in df[['team1','team2']].as_matrix()]
df[['team1','team2']] = teamSort
df = df.groupby(['team1','team2']).apply(lambda x: str(sum(x.winner == x.team1))+':'+str(sum(x.winner == x.team2))).reset_index(name='score')
输出:
team1 team2 score
0 CSK KXIP 1:0
1 DC DD 0:1
2 DC KKR 0:1
3 DD RR 1:0
4 KKR MI 2:1
5 KKR RCB 1:0
6 KXIP RR 0:1