我正在尝试与数据框中的某些数据重叠。 这是一个简单的例子:
df=pd.DataFrame({
'player':['A', 'B', 'C', 'D', 'A', 'C', 'B'],
'game':['gameA', 'gameB', 'gameC', 'gameC', 'gameB', 'gameD', 'gameA']})
DF:
game player
0 gameA A
1 gameB B
2 gameC C
3 gameC D
4 gameB A
5 gameD C
6 gameA B
我想要做的是计算每个组合在两场比赛中的球员数量。
例如,结果应如下所示:
game1 game2 overlap
gameA gameB 2 #Because there is 2 players who play at gameA and gameB
gameA gameC 0
gameA gameD 0
gameB gameA 2
gameB gameC 0
gameB gameD 0
...
我可以使用dictionnary和foreach来做到这一点,但有一种简单的方法可以使用pivot_table或交叉表吗?
非常感谢。
答案 0 :(得分:0)
您可以使用pd.merge
创建game_table
:
game_table = pd.merge(df, df, how='left', on=['player'])
# game_x player game_y
# 0 gameA A gameA
# 1 gameA A gameB
# 2 gameB B gameB
# 3 gameB B gameA
# 4 gameC C gameC
# 5 gameC C gameD
# 6 gameC D gameC
# 7 gameB A gameA
# 8 gameB A gameB
# 9 gameD C gameC
# 10 gameD C gameD
# 11 gameA B gameB
# 12 gameA B gameA
然后将pd.crosstab
应用于game_table
:
freq = pd.crosstab(game_table['game_x'], game_table['game_y'])
# game_y gameA gameB gameC gameD
# game_x
# gameA 2 2 0 0
# gameB 2 2 0 0
# gameC 0 0 2 1
# gameD 0 0 1 1
stack
后跟reset_index
将DataFrame重新整形为所需的格式:
result = freq.stack().reset_index()
import pandas as pd
df = pd.DataFrame(
{'player':['A', 'B', 'C', 'D', 'A', 'C', 'B'],
'game':['gameA', 'gameB', 'gameC', 'gameC', 'gameB', 'gameD', 'gameA']})
game_table = pd.merge(df, df, how='left', on=['player'])
freq = pd.crosstab(game_table['game_x'], game_table['game_y'])
result = freq.stack()
result.name = 'overlap'
result = result.reset_index()
mask = (result['game_x'] != result['game_y'])
result = result.loc[mask]
print(result)
产量
game_x game_y overlap
1 gameA gameB 2 # Because both A and B played in gameA and gameB
2 gameA gameC 0
3 gameA gameD 0
4 gameB gameA 2
6 gameB gameC 0
7 gameB gameD 0
8 gameC gameA 0
9 gameC gameB 0
11 gameC gameD 1
12 gameD gameA 0
13 gameD gameB 0
14 gameD gameC 1