所以我有来自FIFA 13-17的前80名球员的5个数据帧,每个球员包含球员姓名,评分和俱乐部。我的最终目标是将所有这些数据集合并在一起,这样我就可以每个玩家每年获得一个评级,如果没有,则可以获得空值。显然,有些球员每年都没有进入前80名,即:退休。 下面是三个数据帧的片段。
FIFA18
Name Overall Club
0 Cristiano Ronaldo 94 Real Madrid CF
1 L. Messi 93 FC Barcelona
2 Neymar 92 FC Barcelona
3 L. Suárez 92 FC Barcelona
4 M. Neuer 92 FC Bayern Munich
5 De Gea 90 Manchester United
6 R. Lewandowski 90 FC Bayern Munich
7 J. Boateng 90 FC Bayern Munich
8 G. Bale 90 Real Madrid CF
9 Z. Ibrahimović 90 Manchester United
10 T. Courtois 89 Chelsea
FIFA13
Name Overall Club
0 L. Messi 94 FC Barcelona
1 Cristiano Ronaldo 92 Real Madrid CF
2 F. Ribéry 90 FC Bayern Munich
3 Xavi 90 FC Barcelona
4 Iniesta 90 FC Barcelona
5 N. Vidić 89 Manchester United
6 W. Rooney 89 Manchester United
7 Casillas 89 Real Madrid CF
8 David Silva 88 Manchester City
9 Falcao 88 Atlético Madrid
10 Z. Ibrahimović 88 Paris Saint-Germain
出现这种情况的一个例子可能是N.Vidić已经退休。
我的目标表是这个
Name FIFA17 FIA13 Club
0 Cristiano Ronaldo 94 92 Real Madrid CF
1 L. Messi 93 94 FC Barcelona
2 Neymar 92 83 FC Barcelona
3 L. Suárez 92 86 FC Barcelona
4 M. Neuer 92 87 FC Bayern Munich
5 De Gea 90 82 Manchester United
6 R. Lewandowski 90 80 FC Bayern Munich
7 J. Boateng 90 84 FC Bayern Munich
8 G. Bale 90 86 Real Madrid CF
9 Z. Ibrahimović 90 88 Manchester United
10 T. Courtois 89 83 Chelsea
11 F. Ribéry 86 90 FC Bayern Munich
12 Xavi 0 90 FC Barcelona
13 Iniesta 88 90 FC Barcelona
14 N. Vidić 0 89 Manchester United
15 W. Rooney 0 89 Manchester United
16 Casillas 0 89 Real Madrid CF
17 David Silva 87 88 Manchester City
18 Falcao 0 88 Atlético Madrid
我是python和pandas的新手,但我尝试过使用join和merge但是它似乎总是使用每个表的索引而不是唯一的名称。
非常感谢任何帮助!
答案 0 :(得分:3)
以下是通过pd.concat
和pivot_table
的一种方式。它假设您能够将数据帧放在字典中,字典可以是任意长度。
该解决方案还涉及多个俱乐部,仅保留最新的俱乐部。
dfs = {13: df13, 18: df18}
df = pd.concat([dfs[k].assign(Year=k) for k in dfs])
club_map = df.sort_values('Year', ascending=False)\
.drop_duplicates('Name')\
.set_index('Name')['Club']
df['Club'] = df['Name'].map(club_map)
res = df.pivot_table(index=['Name', 'Club'], columns='Year',
values='Overall', aggfunc=np.sum, fill_value=0)\
.reset_index().rename_axis(None, axis='columns')
<强>结果强>
Name Club 13 18
0 Casillas Real Madrid CF 89 0
1 Cristiano Ronaldo Real Madrid CF 92 94
2 David Silva Manchester City 88 0
3 De Gea Manchester United 0 90
4 F. Ribéry FC Bayern Munich 90 0
5 Falcao Atlético Madrid 88 0
6 G. Bale Real Madrid CF 0 90
7 Iniesta FC Barcelona 90 0
8 J. Boateng FC Bayern Munich 0 90
9 L. Messi FC Barcelona 94 93
10 L. Suárez FC Barcelona 0 92
11 M. Neuer FC Bayern Munich 0 92
12 N. Vidić Manchester United 89 0
13 Neymar FC Barcelona 0 92
14 R. Lewandowski FC Bayern Munich 0 90
15 T. Courtois Chelsea 0 89
16 W. Rooney Manchester United 89 0
17 Xavi FC Barcelona 90 0
18 Z. Ibrahimović Manchester United 88 90
答案 1 :(得分:2)
在MultiIndex
的{{3}}列中使用set_index
,然后将NaN
替换为concat
,投放到integer
并最后转换MultiIndex
列s1 = df1.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
s2 = df2.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
df = pd.concat([s2, s1], axis=1, keys=('FIFA13','FIFA18')).fillna(0).astype(int).reset_index()
print (df)
Name Club FIFA13 FIFA18
0 Casillas Real Madrid CF 89 0
1 Cristiano Ronaldo Real Madrid CF 92 94
2 David Silva Manchester City 88 0
3 De Gea Manchester United 0 90
4 F. Ribéry FC Bayern Munich 90 0
5 Falcao Atlético Madrid 88 0
6 G. Bale Real Madrid CF 0 90
7 Iniesta FC Barcelona 90 0
8 J. Boateng FC Bayern Munich 0 90
9 L. Messi FC Barcelona 94 93
10 L. Suárez FC Barcelona 0 92
11 M. Neuer FC Bayern Munich 0 92
12 N. Vidić Manchester United 89 0
13 Neymar FC Barcelona 0 92
14 R. Lewandowski FC Bayern Munich 0 90
15 T. Courtois Chelsean 0 89
16 W. Rooney Manchester United 89 0
17 Xavi FC Barcelona 90 0
18 Z. Ibrahimović Manchester United 0 90
19 Z. Ibrahimović Paris Saint-Germain 88 0
:
Names
如果订单是重要的解决方案类似,只能获得Club
与s1 = df1.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
s2 = df2.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
df = pd.concat([s2, s1], axis=1, keys=('FIFA13','FIFA18')).fillna(0).astype(int)
idx = pd.concat([df1[['Name','Club']], df2[['Name','Club']]]).drop_duplicates()
df = df.reindex(idx).reset_index().drop_duplicates('Name', keep='last')
print (df)
Name Club FIFA13 FIFA18
0 L. Messi FC Barcelona 94 93
1 Cristiano Ronaldo Real Madrid CF 92 94
2 F. Ribéry FC Bayern Munich 90 0
3 Xavi FC Barcelona 90 0
4 Iniesta FC Barcelona 90 0
5 N. Vidić Manchester United 89 0
6 W. Rooney Manchester United 89 0
7 Casillas Real Madrid CF 89 0
8 David Silva Manchester City 88 0
9 Falcao Atlético Madrid 88 0
11 Neymar FC Barcelona 0 92
12 L. Suárez FC Barcelona 0 92
13 M. Neuer FC Bayern Munich 0 92
14 De Gea Manchester United 0 90
15 R. Lewandowski FC Bayern Munich 0 90
16 J. Boateng FC Bayern Munich 0 90
17 G. Bale Real Madrid CF 0 90
18 Z. Ibrahimović Manchester United 0 90
19 T. Courtois Chelsean 0 89
的唯一对,加入并删除重复项fillna
和reset_index
:
list comprehension
对于一般解决方案,请使用dfs = [df2, df1]
names= ['FIFA13','FIFA18']
s = [x.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall'] for x in dfs]
df = pd.concat(s, axis=1, keys=(names)).fillna(0).astype(int)
s1 = [x[['Name','Club']] for x in dfs]
idx = pd.concat(s1).drop_duplicates()
df = df.reindex(idx).reset_index().drop_duplicates('Name', keep='last')
s:
LIBC_FATAL_STDERR_=1