如何在pandas中组合3个复杂的数据帧

时间:2015-03-11 04:51:14

标签: python python-2.7 pandas

我有3个名为df1,df2和df3

的pandas数据帧
df1:
      match_up        result
0   1985_1116_1234      1
1   1985_1120_1345      1
2   1985_1207_1250      1
3   1985_1229_1425      1
4   1985_1242_1325      1

df2:
  team_df2       win_df2  
0  1207           0.700               
2  1116           0.636               
3  1120           0.621               
4  1229           0.615                
5  1242           0.679                

df3:
    team_df3       win_df3  
1   1234           0.667               
7   1250           0.759               
11  1325           0.774               
12  1345           0.742               
15  1425           0.667 

我需要new_data_frame组合df1df2df3,格式如下:

          match_up        result  team_df2  team_df3  win_df2  win_df3
    0   1985_1116_1234      1      1116       1234    0.636     0.667
    1   1985_1120_1345      1      1120       1345    0.621     0.742
    2   1985_1207_1250      1      1207       1250    0.700     0.759 
    3   1985_1229_1425      1      1229       1425    0.615     0.667
    4   1985_1242_1325      1      1242       1325    0.679     0.774

如何在熊猫中做到这一点?

2 个答案:

答案 0 :(得分:2)

import pandas as pd

df1 = pd.DataFrame({'match_up':['1985_1116_1234','1985_1120_1345','1985_1207_1250','1985_1229_1425','1985_1242_1325'],
                    'results':[1,1,1,1,1]})

df2 = pd.DataFrame({'team_df2':[1207,1116,1120,1229,1242],
                    'win_df2':[0.700,0.636,0.621,0.615,0.679]})

df3 = pd.DataFrame({'team_df3':[1234,1250,1325,1345,1425],
                    'win_df3':[0.667,0.759,0.774,0.742,0.667]})


df1['match_up'].apply(lambda x: x.split('_')[1])

final = pd.merge(df1,df2,
        left_on=df1['match_up'].apply(lambda x: int(x.split('_')[1])).values,
        right_on='team_df2',how='left')

final = pd.merge(final,df3,
        left_on=df1['match_up'].apply(lambda x: int(x.split('_')[2])).values,
        right_on='team_df3',how='left')

输出:

In [23]: final
Out[23]: 
         match_up  results  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234        1      1116    0.636      1234    0.667
1  1985_1120_1345        1      1120    0.621      1345    0.742
2  1985_1207_1250        1      1207    0.700      1250    0.759
3  1985_1229_1425        1      1229    0.615      1425    0.667
4  1985_1242_1325        1      1242    0.679      1325    0.774

答案 1 :(得分:0)

您需要提取字符串并将其转换为整数才能正确merge ...

# Set up result DataFrame
df = df1.copy()
df['year'], df['id2'], df['id3'] = list(zip(*df['match_up'].str.split('_')))
df[['id2', 'id3']] = df[['id2', 'id3']].astype(int)

# Do merges
df = pd.merge(df, df2, left_on='id2', right_on='team_df2')
df = pd.merge(df, df3, left_on='id3', right_on='team_df3')

# Drop unneeded columns and print
df = df.drop(['id2', 'year', 'id3'], axis=1)
print(df)

产量

         match_up  result  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234       1      1116    0.636      1234    0.667
1  1985_1120_1345       1      1120    0.621      1345    0.742
2  1985_1207_1250       1      1207    0.700      1250    0.759
3  1985_1229_1425       1      1229    0.615      1425    0.667
4  1985_1242_1325       1      1242    0.679      1325    0.774