我有两个数据,一个带有列:
df1 =
ID As Hs Ts
A A_1 A_6 A_7
B B_1
C C_1 C10
D D_1
E E_1,E_2 E_5 E_4
F F_1,F_4
一对配对得分:
df2 =
ID1 1 ID2 2 SCORE
A A_1 B B_1 1
A A_6 B B_1 0.5
A A_7 B B_1 0.3
A A_1 C C_1 1
A A_6 C C_1 0.4
A A_7 C C_1 0.3
A A_1 C C_10 0.3
A A_6 C C_10 0.5
A A_7 C C_10 0.3
A A_1 D D_1 1
A A_6 D D_1 0.2
A A_7 D D_1 0.3
A A_1 E E_1 1
A A_6 E E_1 0.5
A A_7 E E_1 0.4
A A_1 E E_2 0.8
A A_6 E E_2 0.2
A A_7 E E_2 0.5
A A_1 E E_5 0.3
A A_6 E E_5 0.3
A A_7 E E_5 0.6
A A_1 E E_4 0.1
A A_6 E E_4 0.4
A A_7 E E_4 0.6
A A_1 F F_1 0.3
A A_6 F F_1 0.3
A A_7 F F_1 0.6
A A_1 F F_4 0.1
A A_6 F F_4 0.4
A A_7 F F_4 0.6
B B_1 C C_1 0.6
B B_1 C C_10 0.1
B B_1 D D_1 0.4
B B_1 E E_1 0.6
B B_1 E E_2 0.2
B B_1 E E_5 0.3
B B_1 E E_4 0.6
B B_1 F F_1 0.4
B B_1 F F_4 0.9
C C_1 D D_1 0.8
C C_1 E E_1 0.6
C C_1 E E_2 0.4
C C_1 E E_4 0.3
C C_1 E E_5 0.2
C C_1 F F_1 0.3
C C_1 F F_4 0.4
C C_10 D D_1 0.2
C C_10 E E_1 0.3
C C_10 E E_2 0.4
C C_10 E E_5 0.3
C C_10 E E_4 0.4
C C_10 F F_1 0.3
C C_10 F F_4 0.2
D D_1 F F_4 1
D D_1 E E_2 0.5
D D_1 E E_5 0.3
D D_1 E E_4 0.2
D D_1 F F_1 0.5
D D_1 F F_4 0.2
E E_1 F F_1 0.9
E E_1 F F_4 0.2
E E_2 F F_1 0.3
E E_2 F F_4 0.2
E E_5 F F_1 0.5
E E_5 F F_4 0.3
E E_4 F F_1 0.6
E E_4 F F_4 0.3
我想要的矩阵输出为:
As Hs Ts
A_1 B_1 C_1 D_1 E_1 E_2 A_6 E_5 F_1 F_4 A_7 C_10 E_4
As A_1 1 1 1 1 0.8 0.3 0.3 0.1 0.3 0.1
B_1 1 0.6 0.4 0.6 0.2 0.5 0.3 0.4 0.9 0.3 0.1 0.6
C_1 1 0.6 0.8 0.6 0.4 0.4 0.2 0.3 0.4 0.3 0.3
D_1 1 0.4 0.8 1 0.5 0.2 0.3 0.5 0.2 0.3 0.2 0.2
E_1 1 0.6 0.6 1 0.5 0.2 0.4 0.3
E_2 0.8 0.2 0.4 1 0.2 0.2 0.5 0.4
Hs A_6 0.5 0.4 0.2 0.5 0.2 0.3 0.3 0.4 0.5 0.4
E_5 0.3 0.3 0.2 0.3 0.3 0.6 0.3
F_1 0.3 0.4 0.3 0.5 0.9 0.3 0.3 0.6 0.3 0.6
F_4 0.1 0.9 0.4 0.2 0.2 0.2 0.4 0.6 0.2 0.3
Ts A_7 0.3 0.3 0.3 0.4 0.5 0.6 0.6 0.6 0.3 0.6
C_10 0.3 0.1 0.5 0.3 0.4
E_4 0.1 0.6 0.3 0.2 0.4 0.6 0.4
请注意,没有分数的对在输出矩阵中应为空。
我应该尝试使用pd.crosstab吗? df.pivot_table吗? 分组和取消堆叠?
如何获得所需的输出?任何建议,将不胜感激。 请注意,没有分数的对在输出矩阵中应为空。 谢谢
答案 0 :(得分:0)
这是一个解决方案的示例,困难在于按照所需的内容对数据进行排序..:我选择了另一个小示例
import pandas as pd
import numpy as np
idx ="""
grp id
As A_1
As B_1
As C_1
As D_1
As E_1
As E_2
Hs A_6
Hs E_5
Hs F_1
Hs F_4
Ts A_7
Ts C_10
Ts E_4
"""
data="""
ID1 1 ID2 2 SCORE
A A_1 B B_1 1
A F_1 B B_1 1
A A_6 B E_2 0.5
A A_7 B B_1 0.3
A A_1 C C_1 1
A A_6 C C_1 0.4
A A_7 C E_5 0.3
A A_1 C C_10 0.3
A A_6 C C_10 0.5
A A_7 C C_10 0.3
A A_1 D D_1 1
A A_6 D D_1 0.2
A A_7 D D_1 0.3
A A_7 E E_4 0.6
A A_1 F E_1 0.3
A E_5 F F_1 0.3
A A_7 F F_1 0.6
A A_1 F F_4 0.1
A A_6 F F_4 0.4
"""
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
ix = pd.read_csv(pd.compat.StringIO(idx), sep='\s+')
df.drop(['ID1', 'ID2'], axis=1, inplace=True)
df1 = df.copy(deep=True)
#i append (col 1, col 2) from df1 to (col 2, col 1) to df
#i could build my crosstab after with groupby
df1.columns = ['2', '1', 'SCORE']
df = df.append(df1, sort=False)
#i link the groupname As,Hs,Ts to the name of player and i concatenate the information
df = pd.merge(df, ix, left_on='1', right_on='id')
df['1'] = '(' + df['grp'].map(str) + ', ' + df['1'].map(str) + ')'
df.drop(['grp', 'id'],axis=1, inplace=True)
df = pd.merge(df, ix, left_on='2', right_on='id')
df['2'] = '(' + df['grp'].map(str) + ', ' + df['2'].map(str) + ')'
df.drop(['grp', 'id'],axis=1, inplace=True)
#i groupby player and i unstack to build the crosstab
df = df.groupby([ '1','2']).SCORE.max().unstack().fillna(' ')
print(df)
结果:
2 (As, A_1) (As, B_1) (As, C_1) ... (Ts, A_7) (Ts, C_10) (Ts, E_4)
1 ...
(As, A_1) 1 1 ... 0.3
(As, B_1) 1 ... 0.3
(As, C_1) 1 ...
(As, D_1) 1 ... 0.3
(As, E_1) 0.3 ...
(As, E_2) ...
(Hs, A_6) 0.4 ... 0.5
(Hs, E_5) ... 0.3
(Hs, F_1) 1 ... 0.6
(Hs, F_4) 0.1 ...
(Ts, A_7) 0.3 ... 0.3 0.6
(Ts, C_10) 0.3 ... 0.3
(Ts, E_4) ... 0.6
对列使用多索引和标头的另一种解决方案:
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
ix = pd.read_csv(pd.compat.StringIO(idx), sep='\s+')
df.drop(['ID1', 'ID2'], axis=1, inplace=True)
df1 = df.copy(deep=True)
df1.columns = ['2', '1', 'SCORE']
As = ['A_1', 'B_1', 'C_1' , 'D_1', 'E_1', 'E_2']
Hs = ['A_6', 'E_5', 'F_1', 'F_4']
Ts = ['A_7', 'C_10', 'E_4']
df = df.append(df1, sort=False)
df = pd.merge(df, ix, left_on='1', right_on='id')
df.drop(['id'], axis=1, inplace=True)
df = pd.merge(df, ix, left_on='2', right_on='id')
df.drop(['id'],axis=1, inplace=True)
df = df.groupby(['grp_x', '1','2']).SCORE.max().unstack().fillna(' ')
df = df[As + Hs + Ts]
header = ['As', 'As', 'As', 'As', 'As', 'As', 'Hs', 'Hs', 'Hs', 'Hs', 'Ts', 'Ts', 'Ts']
df.columns = pd.MultiIndex.from_tuples(list(zip(header, df.columns)))
print(df)
结果:
As Hs Ts
A_1 B_1 C_1 D_1 E_1 E_2 A_6 E_5 F_1 F_4 A_7 C_10 E_4
grp_x 1
As A_1 1 1 1 0.3 0.1 0.3
B_1 1 1 0.3
C_1 1 0.4
D_1 1 0.2 0.3
E_1 0.3
E_2 0.5
Hs A_6 0.4 0.2 0.5 0.4 0.5
E_5 0.3 0.3
F_1 1 0.3 0.6
F_4 0.1 0.4
Ts A_7 0.3 0.3 0.3 0.6 0.3 0.6
C_10 0.3 0.5 0.3
E_4 0.6
如果我使用您的样品,结果:
As Hs Ts
A_1 B_1 C_1 D_1 E_1 E_2 A_6 E_5 F_1 F_4 A_7 C_10 E_4
grp_x 1
As A_1 1 1 1 1 0.8 0.3 0.3 0.1 0.3 0.1
B_1 1 0.6 0.4 0.6 0.2 0.5 0.3 0.4 0.9 0.3 0.1 0.6
C_1 1 0.6 0.8 0.6 0.4 0.4 0.2 0.3 0.4 0.3 0.3
D_1 1 0.4 0.8 0.5 0.2 0.3 0.5 1 0.3 0.2 0.2
E_1 1 0.6 0.6 0.5 0.9 0.2 0.4 0.3
E_2 0.8 0.2 0.4 0.5 0.2 0.3 0.2 0.5 0.4
Hs A_6 0.5 0.4 0.2 0.5 0.2 0.3 0.3 0.4 0.5 0.4
E_5 0.3 0.3 0.2 0.3 0.3 0.5 0.3 0.6 0.3
F_1 0.3 0.4 0.3 0.5 0.9 0.3 0.3 0.5 0.6 0.3 0.6
F_4 0.1 0.9 0.4 1 0.2 0.2 0.4 0.3 0.6 0.2 0.3
Ts A_7 0.3 0.3 0.3 0.4 0.5 0.6 0.6 0.6 0.3 0.6
C_10 0.3 0.1 0.2 0.3 0.4 0.5 0.3 0.3 0.2 0.3 0.4
E_4 0.1 0.6 0.3 0.2 0.4 0.6 0.3 0.6 0.4