说我有两列是这样的:
home_team away_team
SWE DEN
NOR GER
SWE NOR
GER DEN
GER SWE
,并希望创建两个新列来统计home_team和away_team所玩的游戏,如下所示:
home_team away_team games_HomeTeam games_AwayTeam
SWE DEN 1 1
NOR GER 1 1
SWE NOR 2 2
GER DEN 2 2
GER FRA 3 1
答案 0 :(得分:2)
您可以执行以下操作:
flatten = [e for p in zip(df.home_team, df.away_team) for e in p]
counts = pd.DataFrame((pd.Series(flatten).groupby(flatten).cumcount() + 1).values.reshape(-1, 2),
columns=['games_HomeTeam', 'games_AwayTeam'])
print(pd.concat([df, counts], axis=1))
输出
home_team away_team games_HomeTeam games_AwayTeam
0 1 2 1 1
1 3 4 1 1
2 1 3 2 2
3 2 4 2 2
4 1 5 3 1
首先将两列展平,然后进行分组和累加,然后进行整形。最后与df
保持联系。