如何汇总数据框中的项目?

时间:2017-05-09 10:09:11

标签: python pandas dataframe

我有一个这样的数据框:

team1   team2   winner
KKR     RCB     KKR
CSK     KXIP    CSK
RR      DD      DD
MI      KKR     KKR
DC      KKR     KKR
KXIP    RR      RR
DC      DD      DD
MI      KKR     KKR.... 

现在我想要检查的是一支球队在锦标赛中与另一支球队比赛的胜利次数。例如:MI对KKR赢了2次。所以输出应该像 MI vs KKR = MI:2 KKR:0

我可以通过一次带2支球队手动完成,但这需要更长的时间。有人可以帮我这个吗?

2 个答案:

答案 0 :(得分:0)

如果团队订单在整个数据集中不一致,则需要定义match列:

df['match'] = df[['team1', 'team2']].apply(
    lambda row: tuple(sorted(row.values)), 
    axis=1
)

元组是必需的,因为它是可以清除的。

目前还不清楚你想要什么输出,但这应该让你接近你的结果:

df.groupby('match')['winner'].value_counts()

输出:

match        winner
(CSK, KXIP)  CSK       1
(DC, DD)     DD        1
...

答案 1 :(得分:0)

假设团队的顺序始终相同:

df.groupby(['team1','team2']).apply(lambda x: str(sum(x.winner == x.team1))+':'+str(sum(x.winner == x.team2)))

没有假设,这将是解决方案 - 使用df创建:

import pandas as pd

df = pd.DataFrame({'team1': ['KKR','CSK','RR','MI','DC','KXIP','DC','MI','KKR'],
                   'team2': ['RCB','KXIP','DD','KKR','KKR','RR','DD','KKR','MI'],
                   'winner': ['KKR','CSK','DD','KKR','KKR','RR','DD','KKR','MI']})


teamSort = [sorted(item) for item in df[['team1','team2']].as_matrix()]
df[['team1','team2']] = teamSort

df = df.groupby(['team1','team2']).apply(lambda x: str(sum(x.winner == x.team1))+':'+str(sum(x.winner == x.team2))).reset_index(name='score')

输出:

  team1 team2 score
0   CSK  KXIP   1:0
1    DC    DD   0:1
2    DC   KKR   0:1
3    DD    RR   1:0
4   KKR    MI   2:1
5   KKR   RCB   1:0
6  KXIP    RR   0:1