我有3个名为df1,df2和df3
的pandas数据帧df1:
match_up result
0 1985_1116_1234 1
1 1985_1120_1345 1
2 1985_1207_1250 1
3 1985_1229_1425 1
4 1985_1242_1325 1
df2:
team_df2 win_df2
0 1207 0.700
2 1116 0.636
3 1120 0.621
4 1229 0.615
5 1242 0.679
df3:
team_df3 win_df3
1 1234 0.667
7 1250 0.759
11 1325 0.774
12 1345 0.742
15 1425 0.667
我需要new_data_frame
组合df1
,df2
和df3
,格式如下:
match_up result team_df2 team_df3 win_df2 win_df3
0 1985_1116_1234 1 1116 1234 0.636 0.667
1 1985_1120_1345 1 1120 1345 0.621 0.742
2 1985_1207_1250 1 1207 1250 0.700 0.759
3 1985_1229_1425 1 1229 1425 0.615 0.667
4 1985_1242_1325 1 1242 1325 0.679 0.774
如何在熊猫中做到这一点?
答案 0 :(得分:2)
import pandas as pd
df1 = pd.DataFrame({'match_up':['1985_1116_1234','1985_1120_1345','1985_1207_1250','1985_1229_1425','1985_1242_1325'],
'results':[1,1,1,1,1]})
df2 = pd.DataFrame({'team_df2':[1207,1116,1120,1229,1242],
'win_df2':[0.700,0.636,0.621,0.615,0.679]})
df3 = pd.DataFrame({'team_df3':[1234,1250,1325,1345,1425],
'win_df3':[0.667,0.759,0.774,0.742,0.667]})
df1['match_up'].apply(lambda x: x.split('_')[1])
final = pd.merge(df1,df2,
left_on=df1['match_up'].apply(lambda x: int(x.split('_')[1])).values,
right_on='team_df2',how='left')
final = pd.merge(final,df3,
left_on=df1['match_up'].apply(lambda x: int(x.split('_')[2])).values,
right_on='team_df3',how='left')
输出:
In [23]: final
Out[23]:
match_up results team_df2 win_df2 team_df3 win_df3
0 1985_1116_1234 1 1116 0.636 1234 0.667
1 1985_1120_1345 1 1120 0.621 1345 0.742
2 1985_1207_1250 1 1207 0.700 1250 0.759
3 1985_1229_1425 1 1229 0.615 1425 0.667
4 1985_1242_1325 1 1242 0.679 1325 0.774
答案 1 :(得分:0)
您需要提取字符串并将其转换为整数才能正确merge
...
# Set up result DataFrame
df = df1.copy()
df['year'], df['id2'], df['id3'] = list(zip(*df['match_up'].str.split('_')))
df[['id2', 'id3']] = df[['id2', 'id3']].astype(int)
# Do merges
df = pd.merge(df, df2, left_on='id2', right_on='team_df2')
df = pd.merge(df, df3, left_on='id3', right_on='team_df3')
# Drop unneeded columns and print
df = df.drop(['id2', 'year', 'id3'], axis=1)
print(df)
产量
match_up result team_df2 win_df2 team_df3 win_df3
0 1985_1116_1234 1 1116 0.636 1234 0.667
1 1985_1120_1345 1 1120 0.621 1345 0.742
2 1985_1207_1250 1 1207 0.700 1250 0.759
3 1985_1229_1425 1 1229 0.615 1425 0.667
4 1985_1242_1325 1 1242 0.679 1325 0.774