按数字组合Pandas数据帧

时间:2015-03-11 11:44:54

标签: python python-2.7 pandas

我有3个名为df1df2df3的pandas数据框。

df1:
      match_up        result
0   1985_1116_1234      1
1   1985_1120_1345      1
2   1985_1207_1250      1
3   1985_1229_1425      1
4   1985_1242_1325      1
5   1986_1116_1430      0
6   1986_1250_ 1229     0
7   1986_1207_1437      1 

df2:
  team_df2       win_df2  
  1207           0.700               
  1116           0.636               
  1120           0.621               
  1229           0.615                
  1242           0.679
  1116           0.742
  1207           0.567
  1250           0.342                 

df3:
    team_df3       win_df3  
     1234           0.667               
     1250           0.759               
     1325           0.774               
     1345           0.742               
     1425           0.667
     1229           0.845
     1430           0.434
     1437           0.123

数据框team_df2中的列df2是数据框year_中可变df1(1985_)之后的值。列team_df3是变量year_val1_之后的值(1985_1116 _])

df2df3中的前5行代表1985年,数据框df2中的最后3行和df3代表1986年。

我需要一个以下列格式组合df1,df2和df3的new_data_frame:

   match_up        result  team_df2  team_df3  win_df2  win_df3
0   1985_1116_1234      1      1116       1234    0.636     0.667
1   1985_1120_1345      1      1120       1345    0.621     0.742
2   1985_1207_1250      1      1207       1250    0.700     0.759 
3   1985_1229_1425      1      1229       1425    0.615     0.667
4   1985_1242_1325      1      1242       1325    0.679     0.774
5   1986_1116_1430      0      1116       1430    0.742     0.434
6   1986_1250_ 1229     0      1250       1229    0.342     0.845
7   1986_1207_1437      1      1207       1437    0.567      0.123

我之前有过这个问题here,我也得到了很好的答案。但我面临的问题是,当年份值(df列中的数据框match_up)更改为teamdf2列中的团队值和{{ 1}}继续重复。因此,如果我在df3team_df3值上合并这三个数据框,我就无法获得所需的输出。

亲切地帮助我。操作等于在下面的图像中组合数据帧1,2和3。但下图中第三个数据框中的team_df2列值更改如下:

A_515_729



B_767_890



P_390_789

enter image description here

1 个答案:

答案 0 :(得分:2)

拆分你的match_up列,这样我们就把年份和其他df id作为单独的列:

In [23]:

df['year'] = list(map(int,(df['match_up'].str.split('_').str[0])))
df['team_df2'] = list(map(int,(df['match_up'].str.split('_').str[1])))
df['team_df3'] = list(map(int,(df['match_up'].str.split('_').str[2])))
df1['year'] = list(map(int,(df['match_up'].str.split('_').str[0])))
df2['year'] = list(map(int,(df['match_up'].str.split('_').str[0])))
df
Out[23]:
         match_up  result  year  team_df2  team_df3
0  1985_1116_1234       1  1985      1116      1234
1  1985_1120_1345       1  1985      1120      1345
2  1985_1207_1250       1  1985      1207      1250
3  1985_1229_1425       1  1985      1229      1425
4  1985_1242_1325       1  1985      1242      1325
5  1986_1116_1430       0  1986      1116      1430
6  1986_1250_1229       0  1986      1250      1229
7  1986_1207_1437       1  1986      1207      1437

现在我们可以使用年份和团队列进行合并,以避免歧义:

In [24]:

merged = df.merge(df1, left_on=['year', 'team_df2'], right_on=['year','team_df2'])
merged = merged.merge(df2, left_on=['year', 'team_df3'], right_on=['year','team_df3'])
merged
Out[24]:
         match_up  result  year  team_df2  team_df3  win_df2  win_df3
0  1985_1116_1234       1  1985      1116      1234    0.636    0.667
1  1985_1120_1345       1  1985      1120      1345    0.621    0.742
2  1985_1207_1250       1  1985      1207      1250    0.700    0.759
3  1985_1229_1425       1  1985      1229      1425    0.615    0.667
4  1985_1242_1325       1  1985      1242      1325    0.679    0.774
5  1986_1116_1430       0  1986      1116      1430    0.742    0.434
6  1986_1250_1229       0  1986      1250      1229    0.342    0.845
7  1986_1207_1437       1  1986      1207      1437    0.567    0.123

然后您可以删除不再感兴趣的列:

In [27]:

merged.drop('year',axis=1)
Out[27]:
         match_up  result  team_df2  team_df3  win_df2  win_df3
0  1985_1116_1234       1      1116      1234    0.636    0.667
1  1985_1120_1345       1      1120      1345    0.621    0.742
2  1985_1207_1250       1      1207      1250    0.700    0.759
3  1985_1229_1425       1      1229      1425    0.615    0.667
4  1985_1242_1325       1      1242      1325    0.679    0.774
5  1986_1116_1430       0      1116      1430    0.742    0.434
6  1986_1250_1229       0      1250      1229    0.342    0.845
7  1986_1207_1437       1      1207      1437    0.567    0.123